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The  goal  of  any  optimization  experiment  is  to  find  the  settings  of  the  factors 
that  can  be  controlled  which  results  in  optimal  levels  of  the  responses  of  interest.  In 
robust  parameter  design,  the  two  responses  of  interest  are  the  mean  and  variance  of 
a quality  characteristic.  In  multiple  response  optimization,  the  responses  of  interest 
are  the  quality  characteristics  of  the  product.  In  both  of  these  cases,  a quantity  that 
is  a function  of  the  estimates  of  the  responses  of  interest  is  either  maximized  or 
minimized.  A variety  of  quantities  have  been  proposed  for  robust  parameter  design 
and  multiple  response  optimization,  but  all  of  the  proposed  quantities  are  lacking  in 
some  respect-they  may  lack  intuitive  appeal,  depend  too  heavily  on  the  definition 
of  subjective  parameters,  or  fail  altogether  in  certain  situations.  In  addition,  most 
of  the  quantities  proposed  for  robust  parameter  design  cannot  be  adapted  easily 
to  multiple  response  optimization.  The  probability  that  all  of  the  responses  are 
simultaneously  within  their  upper  and  lower  specification  limits  is  a quantity  which 
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can  be  used  for  robust  parameter  design  and  multiple  response  optimization.  The 
probability  method  also  has  an  intuitive  appeal  that  will  make  it  easy  to  explain  to 
people  in  fields  outside  of  statistics.  This  method  does  not  depend  on  the  definition 
of  subjective  parameters,  and  it  works  in  all  of  the  situations  that  have  been 
addressed.  It  may  also  be  extended  to  multiple  response  robust  parameter  design, 
which  none  of  the  other  methods  has  attempted. 
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CHAPTER  1 
INTRODUCTION 

It  is  the  goal  of  any  industry  to  produce  high  quality  products  as  inexpensively 
as  possible.  The  quality  of  a product  is  usually  measured  by  physical  characteristics, 
such  as  diameter,  purity,  taste,  or  hardness.  When  these  characteristics  are  at 
specific  target  levels  or  values,  the  product  is  thought  to  be  of  high  quality.  As  the 
characteristics  deviate  from  the  target  levels,  the  quality  of  the  product  decreases. 
Therefore  producing  a high  quality  product  translates  into  producing  the  product 
with  quality  characteristics  at  specific  levels.  There  are  three  possible  situations  for 
an  individual  characteristic: 

• Target  is  best:  the  quality  characteristic  has  a target  level  that  is  most  desirable. 

For  instance,  the  diameter  of  a roller  bearing  or  viscosity  of  a fluid. 

• Larger  is  better:  the  quality  characteristic  should  be  made  as  large  as  possible. 

For  instance,  the  purity  of  a chemical  or  a car’s  gas  mileage. 

• Smaller  is  better:  the  quality  characteristic  should  be  made  as  small  as  possible. 

For  instance,  impurities  in  a chemical  or  number  of  bubbles  in  a paint  job. 

Optimizing  a product  or  process  is  a situation  found  in  many  areas  of  applied 
statistics.  A set  of  control  factors  affects  a response  of  interest,  which  is  to  be 
maximized  (or  minimized).  The  goal  of  the  experimenter  is  to  find  the  combination 
of  the  control  factors  that  does  maximize  (or  minimize)  the  response  of  interest. 
Box  and  Wilson  (1951)  introduced  the  topic  of  response  surface  methods  in  an 
attempt  to  answer  this  problem.  They  recommended  that  a series  of  experiments 
be  performed,  first  to  find  out  which  control  factors  actually  affect  the  response 
of  interest,  and  secondly  to  find  the  optimal  settings  of  the  control  factors.  The 
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optimal  settings  are  found  by  running  a response  surface  experiment  capable  of 
fitting  the  response  to  a second  order  polynomial  in  the  control  factors.  Using  some 
basic  calculus,  the  maximum  (or  minimum)  of  the  second  order  polynomial  can  be 
found.  The  situation  can  become  much  more  complicated  if  the  response  of  interest 
does  not  have  constant  variance  in  the  region  in  question,  or  if  there  is  more  than 
one  response  of  interest.  Robust  parameter  design  deals  with  the  situation  where 
there  is  a single  response  that  does  not  have  a constant  variance  in  the  region  of 
interest.  Multiple  response  optimization  deals  with  the  situation  where  there  is  more 
than  one  response  of  interest. 

Robust  Parameter  Design 

In  the  manufacturing  and  use  of  the  product,  there  will  exist  factors  that  will 
affect  the  values  of  the  quality  characteristic,  not  all  of  which  can  be  easily  controlled. 
Those  that  can  be  easily  controlled  are  referred  to  as  control  factors.  These  could  be 
the  amount  of  material  going  into  a chemical  process,  the  temperature  of  a chemical 
bath,  or  the  pressure  in  a reactor.  Those  factors  that  cannot  be  easily  controlled,  or 
are  too  costly  to  control,  are  referred  to  as  noise  factors.  These  could  be  ambient 
humidity  and  temperature,  the  speed  at  which  a customer  drives  his  car,  or  the 
temperature  at  which  a customer  bakes  his  cake. 

Optimizing  the  industrial  process  entails  finding  the  levels  of  the  control 
factors  that  will  produce  a product  that  has  the  desired  quality  characteristics.  Due 
to  natural  variability,  it  is  impossible  to  find  settings  that  will  always  give  the  same 
values  of  the  quality  characteristics,  and  so  all  we  ask  is  that  it  give  those  values 
a high  percentage  of  the  time  on  average.  This  variability  will  lead  to  decreased 
quality,  as  the  product  will  not  always  be  at  its  target  value.  One  approach  that  has 
been  traditionally  taken  is  a two  step  process.  The  first  step  gets  the  characteristic 
at  its  desired  level,  on  average.  The  second  step  is  to  find  the  sources  of  variability 
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and  either  control  them  or  eliminate  them.  Control  or  elimination  of  sources  of 
variability  can  be  difficult  and/or  costly. 

Another  approach  is  to  find  settings  of  the  control  factors  that  make  the 
product  insensitive  to  the  sources  of  variability.  This  is  the  approach  used  in  the 
area  of  parameter  design  or  robust  parameter  design.  This  approach  has  the  appeal 
that  it  is  typically  more  cost  effective  than  controlling  or  trying  to  totally  eliminate 
the  sources  of  variation.  If  levels  of  the  control  factors  can  be  found  for  which 
the  product  is  insensitive  to  the  sources  of  variation,  and  the  product’s  quality 
characteristics  can  be  made  consistently  close  to  their  desired  levels,  this  will  result 
in  a consistently  high  quality  product. 

Experiments  can  be  run  in  order  to  find  the  settings  of  the  control  factors 
that  will  result  in  a consistently  high  quality  product.  The  experiment  will  have 
the  quality  characteristics  as  the  responses  of  interest,  and  the  control  and  noise 
factors  as  the  variables  of  interest.  Notice  that  some  noise  factors  can  be  controlled 
during  experimentation,  even  though  they  cannot  be  easily  controlled  otherwise.  An 
example  of  this  would  be  the  speed  a car  is  driven  or  the  temperature  at  which  a 
cake  is  baked.  Even  if  no  noise  factors  exist  or  are  known,  an  experiment  can  still  be 
run  to  find  the  settings  of  the  control  factors  that  produce  the  smallest  variability 
while  still  achieving  a desired  level  of  the  response. 

Multiple  Response  Optimization 

Most  products  and  processes  have  more  than  one  response  of  interest,  and  all 
of  these  responses  may  depend  on  a set  of  control  factors.  These  responses  may  have 
either  an  ideal  target  value  that  is  desired,  or  a range  of  values  that  will  result  in  a 
product  that  is  of  satisfactory  quality.  These  targets  and  ranges  are  typically  given 
in  the  form  of  targets  and  upper  and  lower  specification  limits.  Often  the  product 
cannot  be  shipped  to  a customer  unless  all  of  the  individual  responses  are  within 
their  upper  and  lower  specifications.  Ideally,  there  would  exist  a combination  of  the 
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control  factors  that  simultaneously  put  all  of  the  responses  at  their  individual  target 
value.  This  does  not  often  happen,  though,  so  a set  of  control  factors  must  be  found 
that  somehow  balances  all  of  the  responses,  so  that  each  response  is  somewhat  close 
to  its  target. 

Multiple  response  optimization  usually  assumes  that  the  variability  of  the 
individual  responses  does  not  depend  on  control  factors.  It  can  be  easily  imagined, 
however,  that  situations  may  exist  where  one  or  more  of  the  responses  depend  on 
the  control  factors,  as  in  robust  parameter  design.  Experimental  situations  can 
therefore  fall  into  three  categories: 

• Robust  Parameter  Design:  There  is  only  one  quality  characteristic  of  interest, 
and  its  variability  depends  on  the  control  factors  in  the  experiment. 

• Multiple  Response  Optimization:  There  is  more  than  one  quality  characteristic 
of  interest,  but  the  variability  of  these  responses  are  not  thought  to  depend  on 
the  control  factors  in  the  experiment. 

• Multiple  Response  Robust  Parameter  Design:  There  is  more  than  one  quality 
characteristic  of  interest,  and  the  variability  of  these  responses  depends  on  the 
control  factors  in  the  experiment. 

Multiple  response  robust  parameter  design  has  not  been  dealt  with  much  in  the 
literature,  but  robust  parameter  design  and  multiple  response  optimization  have 
been  researched  extensively. 

Robust  parameter  design  and  multiple  response  optimization  have  been 
treated  as  two  separate  problems,  each  with  a unique  method  for  finding  the  best 
setting  of  the  control  factors.  All  of  the  proposed  methods  have  much  in  common, 
though.  They  all  start  with  a designed  experiment  which  is  capable  of  supporting 
the  fitting  of  the  responses  using  polynomial  models  in  the  control  settings.  Some 
or  all  of  the  fitted  responses  are  transformed  into  a univariate  function,  which  is 
maximized  (or  minimized)  over  the  experimental  region  of  interest.  The  univariate 
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functions  proposed  for  robust  parameter  design  cannot  be  easily  adapted  for  use 
in  multiple  response  optimization,  and  vice  versa.  Maximizing  the  probability  that 
the  responses  are  between  their  upper  and  lower  specifications  can  be  used  for 
both  robust  parameter  design  and  multiple  response  optimization.  In  the  following 
chapters,  I will  propose  that  this  function  be  used  for  all  three  of  the  experimental 
situations  listed  above. 


CHAPTER  2 

METHODS  OF  OPTIMIZATION 
2.1  The  Taguchi  Method 

Genichi  Taguchi  is  credited  with  introducing  robust  parameter  design  based 
on  his  work  with  various  industries  in  Japan.  His  method  is  explained  in  papers  by 
Kackar  (1985)  and  Byrne  and  Taguchi  (1987),  as  well  as  discussed  by  a panel  headed 
by  Nair  (1992).  Explanations  of  this  method  are  also  included  in  books  on  response 
surface  methodology,  such  as  those  by  Myers  and  Montgomery  (1995)  or  Khuri  and 
Cornell  (1996).  The  philosophy  behind  the  use  of  robust  parameter  designs  is  that  it 
is  more  cost  effective  to  try  and  make  a process  robust  to  sources  of  variability  than 
it  is  to  try  to  control  or  eliminate  those  sources.  This  same  philosophy  can  be  used 
to  improve  product  performance  in  the  field.  The  product  can  be  designed  in  such  a 
way  that  its  performance  is  not  sensitive  to  variability  that  it  will  encounter  during 
its  lifetime. 

Taguchi  believes  that  as  long  as  a product’s  characteristic  is  not  at  its  target 
level,  it  is  not  of  the  highest  quality.  This  is  different  from  the  belief  commonly  held 
in  some  industries  that  as  long  as  a product’s  characteristic  is  between  its  lower  and 
upper  specification  limits,  it  is  of  the  highest  quality  (see  Figure  2.1).  Taguchi’s 
belief  that  a product  can  have  lower  quality  while  still  being  within  specification 
is  the  driving  force  behind  reducing  product  variability.  It  is  noted  that  customer 
satisfaction  increases  as  product  variability  decreases. 

Robust  parameter  design  aims  to  find  the  optimal  settings  of  the  control 
parameters.  In  order  to  do  this,  a criterion  must  be  specified  that  is  to  be  optimized. 
Taguchi  recommends  optimizing  the  expected  loss,  defined  as  the  expected  value  of 
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Figure  2.1:  Loss  Functions  for  Taguchi  and  Classical  View. 
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monetary  losses  an  arbitrary  user  of  the  product  is  likely  to  suffer  at  times  during 
the  product’s  life  span  due  to  performance  variation.  This  makes  the  concept  of 
optimization  concrete  in  that  it  relates  product  performance  and  monetary  losses. 
Taguchi  recommends  using  a quadratic  loss  function, 

1(Y)  = K(Y-t)2, 

where  Y is  the  quality  characteristic,  r is  its  target  value,  and  K is  some  constant. 
The  constant  K is  determined  by  the  cost  to  repair  or  replace  a product  if  it 
performs  unsatisfactorily.  (There  are  slight  modifications  to  this  loss  function  for 
the  cases  of  larger  or  smaller  is  better.)  Then  the  criterion  that  is  to  be  optimized  is 
the  expected  loss, 

L = E[((F)]  = KE\(Y  - t)2]. 

Note  that  the  final  solution  will  not  depend  on  the  choice  of  K . 

The  actual  design  of  the  experiment  is  a design  in  the  control  factors  crossed 
with  a design  in  the  noise  factors.  If  the  control  design  consists  of  m combinations 
of  the  levels  of  the  control  factors  and  the  noise  design  consists  of  n combinations  of 
the  levels  of  the  noise  factors,  then  the  overall  design  will  have  a total  of  ran  runs. 
Notice  that  each  combination  of  control  factors  will  be  repeated  n times,  once  for 
each  combination  of  the  noise  factors.  This  methodology  serves  to  induce  noise  at 
each  combination  of  the  control  factors.  These  n runs  are  then  used  to  calculate  a 
performance  statistic.  There  will  be  ra  performance  statistic  values,  one  for  each 
combination  of  the  control  factors. 

The  performance  statistic  suggested  by  Taguchi  is  referred  to  as  a signal 
to  noise  ratio,  and  a large  value  is  desired.  Although  Kackar  notes  that  over  60 
different  signal  to  noise  ratios  are  used,  there  are  three  main  ones  that  are  used, 
depending  on  the  type  of  quality  characteristic.  They  are: 
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Smaller  is  Better: 


In  this  case,  the  target  value  is  r = 0 and  the  expected  loss  is  proportional  to 


E[(Y-  0)2]  = E[Y2]. 


The  performance  measure  is  then  estimated  by 


n 


-10  log  ^Vi/n  , 


i=l 


where  n is  the  number  of  runs  among  the  noise  factors  at  each  combination  of  levels 
of  the  control  factors. 

Larger  is  Better: 

In  this  case,  the  quality  characteristic  Y is  transformed  to  1 /Y  and  treated 
as  though  it  were  the  case  of  smaller  is  better.  Therefore  the  performance  statistic 
becomes 


n 


-10  log  5^(1  /Vi)/n 


2=1 


Target  is  Best: 

Kackar  lists  two  situations,  both  of  which  depend  on  the  assumption  that 
an  adjustment  parameter  exists.  An  adjustment  parameter  affects  the  mean  of  the 
response,  but  does  not  affect  the  variance.  Thus  the  variance  can  be  reduced,  and 
the  mean  moved  to  the  appropriate  level  by  means  of  the  adjustment  parameter. 
The  situations  of  interest  in  the  target  is  best  case  are  when  the  variance  is  not 
linked  to  the  mean  and  when  it  is. 

In  the  case  that  the  variance  is  not  linked  to  the  mean,  the  idea  is  to  minimize 
the  variance,  since  the  mean  may  be  moved  to  target  by  means  of  the  adjustment 
parameter.  For  this  reason,  the  performance  statistic  suggested  is 


n 


log(s2)  = -log  ^(Vi  - y)2/(n  - 1) 


2=1 
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where  s 2 is  the  unbiased  estimate  of  the  variance  and  y is  the  mean  response  at  that 
experimental  point. 

When  the  variance  is  linked  to  the  mean,  and  is  most  likely  positively 
correlated,  the  coefficient  of  variation  o / ji  should  be  made  small.  When  the 
coefficient  of  variation  remains  small,  any  change  in  the  mean  results  in  a relatively 
small  increase  in  the  variance.  Again  the  target  is  reached  through  adjustment 
parameters.  For  this  reason,  the  performance  statistic  suggested  is 

10log(y2/s2) 

where  y is  the  average  of  the  response  values.  Note  that  this  is  the  only  performance 
statistic  listed  here  that  can  be  legitimately  called  a function  of  the  signal  to  noise 
ratio. 

The  Taguchi  method  starts  with  the  identification  of  control  factors,  noise 
factors,  and  their  appropriate  settings.  The  parameter  design  experiment  is  designed 
and  run,  and  the  appropriate  performance  statistic  is  calculated  for  each  combination 
of  the  control  factor  settings.  The  combination  of  control  factor  settings  with  the 
best  performance  statistic  value  is  then  chosen  as  the  new  setting.  A confirmatory 
experiment  is  run  to  see  if  the  new  settings  do  indeed  lead  to  an  improved  value  of 
the  performance  statistic. 

In  a published  discussion  of  the  Kackar  paper,  Box  points  out  that  the 
important  engineering  ideas  of  Taguchi  are  accompanied  by  statistical  procedures 
that  are  often  unnecessarily  complicated  and  inefficient,  and  sometimes  naive. 

He  specifically  mentions  not  exploiting  the  sequential  nature  of  experimentation, 
limiting  the  choice  of  a design,  focusing  on  the  universal  use  of  a signal  to  noise 
ratio,  and  the  lack  of  data  transformations,  among  others.  Incorporating  some  of 
these  issues  and  others  into  the  methodology  one  uses  would  lead  to  a statistical 
refinement  of  Taguchi’s  excellent  engineering  ideas,  as  will  be  covered  later. 
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2.2  Dual  Response  Surface  Optimization 

As  previously  mentioned,  serious  concerns  were  expressed  about  the  lack  of 
use  of  statistical  methods  in  the  Taguchi  method.  Many  of  these  were  detailed  by 
Box  in  the  discussion  of  Kackar’s  paper.  An  excellent  paper  by  Myers  et  al.  (1992) 
encapsulates  many  statistical  improvements  to  the  Taguchi  method. 

Taguchi  recommends  a design  that  crosses  the  noise  design  with  the  control 
design.  Typically  this  will  result  in  a very  large  design,  unless  they  are  highly 
fractionated  designs.  When  the  control  design  is  highly  fractionated,  there  is 
typically  little  or  no  information  about  interactions,  some  of  which  may  be  very 
important.  Taguchi’s  rationale  is  that  maximum  information  must  be  gained  about 
the  control  by  noise  interactions  and  not  interactions  among  the  control  variables. 
His  designs  therefore  sacrifice  possibly  important  control  by  control  interactions  at 
the  expense  of  measuring  possibly  unimportant  control  by  noise  interactions. 

Myers  et  al.  (1992)  give  an  example  of  a Taguchi  design  with  3 control 
factors  and  2 noise  factors.  The  design  recommended  by  Taguchi  is  an  L$  for  the 
control  variables  crossed  with  a 22  factorial  for  the  noise  variables.  The  L9  design 
is  a three  level  Plackett-Burman  design  for  three  factors  and  is  given  in  Table  2.1. 
The  levels  of  the  three  variables  are  coded  by  -1  for  the  lowest  level,  +1  for  the 
highest  level,  and  0 for  the  middle  level.  When  the  design  for  the  control  variables  is 
crossed  with  the  design  for  the  noise  variables,  the  resulting  experiment  will  consist 
of  9x4=36  runs.  This  approach  gives  maximal  information  about  control  by  noise 
variable  interactions,  because  the  Taguchi  method  seeks  to  find  the  level  of  the 
control  settings  that  is  least  affected  by  the  noise  variables.  However,  often  many 
degrees  of  freedom  are  used  for  estimating  control  by  noise  interactions  that  could 
be  better  utilized  estimating  potentially  important  control  by  control  interactions. 
For  instance,  a popular  response  surface  design  in  five  factors  that  could  be  used 
is  a hybrid  design  consisting  of  a central  composite  design  in  the  control  variables 
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Table  2.1:  Lg  Taguchi  Design. 


Xi 

x2 

^3 

-1 

-1 

-1 

-1 

0 

0 

-1 

+1 

+1 

0 

-1 

0 

0 

0 

+1 

0 

+1 

-1 

+1 

-1 

+1 

+1 

0 

-1 

+1 

+1 

0 

xi,x2,  and  x3  and  the  noise  variables  ^and^,  which  is  given  in  Table  2.2.  The  first 
16  runs  evolve  from  a 25-1  fractional  factorial  in  all  five  factors,  plus  axial  runs  in 
the  three  control  variables  with  the  noise  variables  at  their  mid-level,  and  3 center 
runs  among  the  five  variables.  This  design  has  only  25  runs,  compared  to  the  36 
suggested  by  Taguchi,  and  it  has  the  flexibility  of  estimating  control  by  control 
interactions. 

The  practice  of  putting  the  control  and  noise  factors  into  a single  experimental 
design  was  first  referred  to  as  forming  a combined  array  by  Welch  et  al.  (1990)  and 
Shoemaker  et  al.  (1991).  Combining  all  of  the  factors  into  one  design  allows  for 
greater  flexibility  for  estimating  certain  effects,  such  as  important  control  by  control 
interactions.  Enough  information  will  still  be  available  for  the  control  by  noise 
interactions,  which  will  lead  to  robustness.  The  combined  array  philosophy  has  been 
almost  unanimously  endorsed  by  the  statistical  community. 

A paper  by  Vining  and  Myers  (1990)  speaks  to  many  of  the  other  criticisms  of 
the  Taguchi  method,  such  as  the  universal  use  of  a signal  to  noise  ratio,  and  limiting 
the  recommended  optimal  setting  to  one  of  the  combinations  of  the  factors  that  was 
run  in  the  experiment.  They  recommend  using  the  aforementioned  combined  array 
and  treating  the  mean  and  variance  as  separate  responses.  The  mean  and  variance 
are  each  modelled  with  a second  order  polynomial  and  each  can  be  optimized 
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Table  2.2:  Central  Composite  Design  in  5 Variables. 


Run 

X\ 

X2 

Xz 

2 1 

22 

1 

-1 

-1 

-1 

-1 

+1 

2 

+1 

-1 

-1 

-1 

-1 

3 

-1 

+1 

-1 

-1 

-1 

4 

+1 

+1 

-1 

-1 

+1 

5 

-1 

-1 

+1 

-1 

-1 

6 

+1 

-1 

+1 

-1 

+1 

7 

-1 

+1 

+1 

-1 

+1 

8 

+1 

+1 

+1 

-1 

-1 

9 

-1 

-1 

-1 

+1 

-1 

10 

+1 

-1 

-1 

+1 

+1 

11 

-1 

+1 

-1 

+1 

+1 

12 

+1 

+1 

-1 

+1 

-1 

13 

-1 

-1 

+1 

+1 

+1 

14 

+1 

-1 

+1 

+1 

-1 

15 

-1 

+1 

+1 

+1 

-1 

16 

+1 

+1 

+1 

+1 

+1 

17 

-1 

0 

0 

0 

0 

18 

+1 

0 

0 

0 

0 

19 

0 

-1 

0 

0 

0 

20 

0 

+1 

0 

0 

0 

21 

0 

0 

-1 

0 

0 

22 

0 

0 

+1 

0 

0 

23 

0 

0 

0 

0 

0 

24 

0 

0 

0 

0 

0 

25 

0 

0 

0 

0 

0 
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separately  and  simultaneously.  This  is  referred  to  as  the  dual  response  surface 
optimization  approach. 

This  approach  follows  the  dual  response  approach  of  Myers  and  Carter 
(1973),  which  is  a response  surface  technique  for  dealing  with  two  responses,  where 
generally  both  pertain  to  quality  characteristics  whose  variances  are  assumed 
constant  over  the  experimental  region.  The  responses  are  classified  as  a primary 
response  and  a secondary  response,  where  the  latter  will  act  as  a constraint.  The 
fitted  primary  response  is  given  by 

yp  = b0  + x'b  + x'Bx 

X 

and  the  fitted  secondary  response  is  given  by 

ys  = c o + x'c  + x'Cx 

where  b,  B,c,  and  C are  vectors  and  symmetric  matrices  whose  elements  are  the 
least  squares  estimators  of  the  coefficients  of  the  linear,  quadratic,  and  cross  terms 
of  the  second  order  model. 

The  object  of  the  optimization  is  to  find  conditions  on  the  elements  of  x 
that  optimize  yp  subject  to  the  constraint  that  ys  = k , where  k is  some  desirable  or 
acceptable  value  of  the  secondary  response.  Using  a Lagrangian  multiplier  /i,  they 
consider 

L = b0  + x'b  + x'Bx  — y(co  + x'c  + x'Cx  — k ) 


and  solve  for  the  x that  satisfies 
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The  solution  is 

(B  - pC)x  = i(/xc  — b). 

Much  of  what  Myers  and  Carter  present  deals  with  ensuring  that  a local  maximum 
(or  minimum)  is  actually  obtained,  and  closely  follows  the  ridge  analysis  approach 
developed  by  Hoerl  (1959)  and  Draper  (1963).  The  mathematical  details  will  not  be 
presented  here. 

Vining  and  Myers  recommend  using  the  dual  response  approach  of  Myers 
and  Carter,  using  the  mean  and  variance  as  the  two  responses.  The  three  basic 
situations  are  dealt  with  as  follows: 

• Target  is  best:  minimize  the  variance  a2,  subject  to  the  constraint  that  the 
mean  p is  held  at  a specified  target  value.  Here  the  primary  response  is  the 
variance,  and  the  secondary  response  , or  constraint,  is  the  mean. 

• Larger  is  better:  maximize  the  mean  p,  subject  to  the  constraint  that  the 
variance  cr2  is  at  some  acceptable  level. 

• Smaller  is  better:  minimize  the  mean  p,  subject  to  the  constraint  that  the 
variance  a2  is  at  some  acceptable  level.  In  this  and  the  previous  case,  the 
primary  response  is  the  mean,  and  the  constrained  response  is  the  variance. 

Myers  and  Carter  showed  that  a solution  can  always  be  found  if  the  optimization  is 
performed  on  the  surface  of  a sphere  of  radius  p,  thus  giving  an  additional  constraint 
of  x'x  = p2.  Del  Castillo  and  Montgomery  (1993)  showed  that  this  additional 
constraint  can  be  dispensed  with  by  using  a standard  nonlinear  programming 
technique  called  the  generalized  reduced  gradient  (GRG)  algorithm,  which  will  be 
described  later. 

Vining  and  Myers  recommend  fitting  the  two  responses 


yp  = b0  + x'b  + x'  Bx 
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ys  = c0  + x'c  + x'Cx 

as  in  the  dual  response  situation  of  Myers  and  Carter.  The  experiment  is  set  up 
using  an  n-point  response  surface  design  that  is  capable  of  supporting  the  fit  of 
a second  order  model.  Some  designs  that  are  presented  are  a central  composite 
design  or  a 33  factorial.  Details  of  these,  and  other  designs  can  be  found  in  books 
on  response  surface  methodology,  such  as  Myers  and  Montgomery  (1995)  or  Khuri 
and  Cornell  (1996).  This  design  is  then  replicated  m times  to  obtain  an  estimate 
of  the  variance  at  each  design  point.  Ideally  this  replication  would  be  in  the  noise 
parameters,  as  in  the  Taguchi  method,  but  any  scheme  with  proper  randomization 
would  be  adequate.  The  average  response  yi  and  the  sample  standard  deviation 
Si  are  calculated  for  i = 1, . . . , m,  and  then  fit  to  the  primary  and  secondary 
response  models  above.  The  dual  response  system  is  then  optimized  using  the  GRG 
algorithm,  which  can  be  found  in  many  software  packages,  such  as  Microsoft  Excel’s 
solver  command. 

This  approach  answers  several  of  the  statistical  issues  raised  by  critics  of  the 
Taguchi  method.  It  allows  for  sequential  experimentation,  where  an  experiment 
or  series  of  experiments  is/are  performed  to  locate  a favorable  region,  followed  by 
another  experiment  to  locate  the  optimal  settings  in  that  area.  The  approach  allows 
for  a variety  of  designs  which  may  include  estimating  interactions  among  the  control 
variables.  Both  the  mean  and  the  variance  are  separately  modeled,  allowing  for  a 
better  understanding  of  the  process  to  be  acquired  while  avoiding  the  use  of  a signal 
to  noise  ratio. 

2.3  Alternative  Methods  of  Dual  Response  Surface  Optimization 

Several  alternatives  to  the  optimization  methods  of  Vining  and  Myers,  as 
augmented  by  Del  Castillo  and  Montgomery,  have  been  suggested.  These  deal 
mainly  with  the  perceived  problem  that  the  constraints  on  the  secondary  response 
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are  unnecessarily  restrictive,  mainly  in  the  target  is  best  scenario.  The  main 
suggestion  deals  with  the  secondary  constraint  that  the  fitted  mean  be  at  a certain 
target.  If  some  bias,  defined  as  the  distance  that  the  fitted  mean  is  away  from  the 
target,  is  allowed,  new  settings  of  the  control  factors  may  result  in  an  even  lower 
variance.  This  must  be  done  with  caution,  however,  as  too  much  bias  will  not  be 
good  for  the  overall  optimization  of  the  product.  Though  it  has  not  been  mentioned 
in  the  literature,  in  the  other  two  cases  of  larger  or  smaller  is  better,  the  constraint 
of  setting  the  variance  at  an  acceptable  level  may  also  lead  to  problems,  which  will 
be  detailed  later. 

All  of  the  following  methods  begin  the  same  way  as  Vining  and  Myers.  A 
response  surface  experiment  is  designed  that  will  allow  both  the  mean  and  variance 
(or  a function  of  the  variance)  to  be  modeled  by  a second  order  model  in  the  p 
control  variables.  In  most  cases,  the  standard  deviation  is  used  rather  than  the 
variance,  but  another  common  transformation  of  the  variance  is  a log  transform. 
The  transformation  is  merely  meant  to  allow  a better  fit  to  the  model  form.  The  fit 
of  the  mean  of  the  response  is  given  by 

p p p 

Wfj,=  flo  + PiXi  + 


i= 1 i= 1 j>i 

and  the  standard  deviation  of  the  response  is  given  by 

p p p 


wa 


— 7o  + 7 ixi  + 


7 ijX 


i%j  i 


i= 1 


i= 1 j>i 


A 

where  the  /3  and  7 are  the  least  squares  estimates  of  the  true  parameters.  Once 
the  models  are  fit,  one  or  both  of  the  response  measures  is  to  be  maximized  or 
minimized  in  order  to  determine  the  optimal  setting  of  the  control  factors.  Vining 
and  Myers  recommended  using  a constrained  maximization  (or  minimization)  of  the 
two  responses,  as  described  above.  Other  people  have  recommended  that  different 
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quantities  be  maximized  (or  minimized),  or  that  a modified  approach  be  taken  to 
the  constrained  optimization. 

Lin  and  Tu  (1995)  recommend  minimizing  a squared  error  loss  expression 
defined  as 


MSE  = « - T )2  + u>l 

where  w A and  w2  are  the  fitted  mean  and  variance  from  the  two  responses  surface 
models,  respectively,  and  T is  the  target  value  for  the  mean.  This  allows  some 
deviation  from  the  target  level,  or  bias,  to  enter  into  the  proposed  solution, 
but  it  should  not  be  too  large,  since  the  overall  MSE  must  be  minimized.  The 
maximization  may  also  be  performed  using  the  GRG  algorithm  that  is  recommended 
for  the  method  of  Vining  and  Myers. 

Lin  and  Tu  claim  that  the  advantages  of  their  approach  over  that  of  Vining 
and  Myers  are  that  it  is  not  restricted  to  fitting  full  second  order  models  and  it 
may  lead  to  a better  overall  solution.  By  not  being  restricted  to  a full  second  order 
model,  the  experimenter  can  use  common  regression  techniques  to  find  the  simplest 
model  for  the  responses.  In  fact,  the  two  responses  do  not  even  have  to  have  the 
same  model  form.  In  actuality,  the  use  of  the  GRG  algorithm  for  solving  the  method 
of  Vining  and  Myers  is  not  restricted  to  a full  second  order  model  either.  Lin  and 
Tu  also  mention  that  more  flexibility  may  be  gained  by  minimizing  the  quantity 

MSE*  = X1(w^-T)2  + X2wl 

where  A*  are  prespecified  constants  such  that  0 < Aj  < 1 and  Ai  + A2  = 1.  This 
would  allow  more  freedom  for  the  experimenter,  by  enabling  the  mean  or  variance 
to  take  on  different  weights  in  the  final  solution. 

Copeland  and  Nelson  (1996)  propose  a slightly  different  formulation  to  the 
problem  as  part  of  a paper  about  computer  algorithms  for  finding  a maximum  or 
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a minimum  in  a region.  For  the  target  is  best  case,  they  suggest  minimizing  the 
variance  subject  to  the  bias,  or  distance  the  mean  is  from  target  T,  being  less 
than  some  A,  which  is  specified  by  the  experimenter.  This  can  be  stated  as 

Minimize  w2 

subject  to  (Wfj,  — T)2  < A2. 

This  allows  for  the  mean  to  deviate  from  the  target  level,  but  places  an  upper  limit 
on  the  amount  of  deviation.  For  the  larger  or  smaller  cases,  they  suggest  optimizing 
the  mean  w ^ while  keeping  the  variance  w2  less  than,  rather  than  equal  to,  some 
level.  This  is  really  a minor  modification  to  that  of  Vining  and  Myers,  since  the  final 
solution  typically  has  the  variance  at  the  defined  limit  anyway. 

Copeland  and  Nelson’s  method  answers  a concern  with  the  method  suggested 
by  Lin  and  Tu,  namely  that  the  mean  could  deviate  too  far  from  the  target  level, 
which  is  contrary  to  Taguchi’s  original  idea  of  keeping  the  response  close  to  target 
as  often  as  possible.  By  placing  an  upper  limit  on  the  amount  that  the  mean  can 
deviate  from  the  target  level,  this  method  allows  some  deviation,  but  not  too  much. 
Their  modification  in  the  larger  or  smaller  case  is  an  improvement  over  the  use  of 
Lagrangian  multipliers  suggested  by  Vining  and  Myers,  but  is  really  no  different 
than  that  suggested  by  Del  Castillo  and  Montgomery.  The  linear  programming 
solution  does  not  explicitly  use  Lagrangian  multipliers  and  is  also  not  restricted  to 
equality  contraints.  Copeland  and  Nelson  also  note  that  the  solution  usually  has  the 
variance  equalling  the  upper  limit,  so  little  is  gained  by  using  an  inequality  rather 
than  an  equality. 

Kim  and  Lin  (1998)  propose  a solution  based  on  fuzzy  optimization 
methodology.  They  map  the  mean  and  variance  to  membership  functions,  and  then 
each  of  these  two  functions  are  mapped  to  a single  value  which  is  optimized.  The 
membership  function  lies  between  0 and  1,  and  values  close  to  1 are  better.  The 
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optimal  solution  is  the  set  of  control  settings  that  maximizes  the  minimum  of  the 
two  membership  functions. 

The  general  case  of  the  membership  function  is  given  by 
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< 
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ed_ed\z\ 

ed—l 


if  d 7^  0 
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where  d is  a shape  parameter  (see  Figure  2.2  for  a graph  of  the  membership  function 
for  different  choices  of  d),  and  where  and  za  are  defined  as 
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The  shape  parameter  d is  a measure  of  how  quickly  the  value  of  m(z)  changes  along 
the  0 to  1 scale  as  2 moves  away  from  0.  As  d increases  (from  large  negative  to  large 
positive),  771(2)  changes  slowly  as  2 moves  away  from  0.  Note  that  2 moving  from  0 
to  1 is  the  same  as  the  response  moving  from  the  target  to  its  upper  or  lower  limit. 
In  other  words,  if  it  is  critical  that  the  mean  (or  variance)  be  close  to  target,  a large 
negative  value  of  d should  be  used.  If  it  is  not  critical  that  the  mean  (or  variance) 
be  close  to  target,  then  a large  positive  value  of  d should  be  used  (see  Figure  2.2). 
This  gives  the  experimenter  some  flexibility  to  add  more  weight  to  either  the  mean 
of  the  response  or  the  variance  of  the  response. 
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Figure  2.2:  Membership  Function  for  Some  Choices  of  Shape  Parameters. 

The  optimal  setting  is  the  values  of  the  control  variables  x that  maximizes 
the  minimum  of  the  two  membership  functions,  or  it  may  be  stated  as: 

maximize  A (2.1) 

subject  to  m(Zfj,)  > A 

m(za)  > A 

Since  m(z ) is  close  to  1 when  2 is  close  to  0,  this  implies  that  the  best  setting  will 
be  the  one  that  gets  both  m(zIJ/)  and  m(za ) as  close  to  1 as  possible.  When  this 
happens,  both  the  mean  w ^ and  the  standard  deviation  wa  should  be  close  to  their 
targets,  where  the  target  for  the  standard  deviation  wa  is  usually  0. 

A brief  explanation  of  fuzzy  set  theory  may  help  motivate  the  method 
proposed  by  Kim  and  Lin.  A good  discussion  of  the  subject  from  a statistical 
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Table  2.3:  Membership  Values 


X 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Set  A 

1 

1 

1 

1 

0 

0 

0 

0 

0 

0 

Set  B 

1 

0.9 

0.7 

0.5 

0.2 

0 

0 

0 

0 

0 

point  of  view  can  be  found  in  Laviolette  et  al.  (1995).  Fuzzy  set  theory  assumes 
that  elements  have  some  membership  in  a set.  If  the  membership  is  0,  the  element 
is  never  in  the  set.  If  the  membership  is  1,  the  element  is  always  in  the  set.  As 
membership  increases  from  0 to  1,  the  possibility  that  it  is  in  the  set  increases.  For 
instance,  if  the  elements  of  our  universe  are  the  digits  from  0 to  9,  we  can  define 
two  sets,  A = (x  : x < 3)  and  B — (x  : x is  small).  The  membership  in  each  set 
for  each  element  is  given  in  Table  2.3.  Set  A is  referred  to  as  a crisp  set,  since  all 
of  the  membership  values  are  0 or  1.  Traditional  set  theory  deals  with  crisp  sets. 
Set  B is  a fuzzy  set,  since  the  membership  values  take  values  other  than  0 or  1. 
Membership  can  be  thought  of  as  being  a conditional  probability,  i.e.  P(x  is  small 
given  x = i)  gives  the  membership  values  for  i = 0, 1, ...,  9.  The  membership  values 
for  the  intersection  of  two  sets  is  given  by  the  minimum  of  the  membership  values  of 
the  two  sets.  Similarly,  the  union  of  two  sets  uses  the  maximum. 

The  method  of  Kim  and  Lin  defines  two  fuzzy  sets,  A=(The  mean  is  at 
a good  level)  and  B=(The  standard  deviation  is  at  a good  level).  It  then  finds 
the  value  of  x that  gives  the  greatest  possibility  of  being  in  the  intersection  of  A 
and  B,  or  AP|B=(The  mean  and  standard  deviation  are  at  good  levels).  Since  the 
intersection  of  two  fuzzy  sets  uses  the  minimum  of  the  two  membership  values,  their 
method  is  maximizing  the  minimum  of  the  two  membership  values  that  they  define. 

2.4  Criticisms  of  the  Various  Optimization  Schemes 

All  of  the  optimization  schemes  have  shortcomings  that  should  be  addressed. 
The  method  suggested  by  Vining  and  Myers  has  already  been  mentioned  as  being 
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too  restrictive  on  its  secondary  response.  The  issue  in  the  target  is  best  case  of  not 
allowing  any  bias  has  been  addressed.  In  the  larger  or  smaller  is  better  case,  the 
restriction  of  the  variance  being  equal  to  some  acceptable  level  has  not  really  been 
addressed.  Copeland  and  Nelson  change  the  restriction  that  the  variance  be  less 
than  or  equal  to  some  level,  but  admit  that  the  final  solution  usually  has  it  equalling 
that  level  anyway.  By  dictating  what  is  an  acceptable  level  of  variance,  one  may 
find  a solution  that  is  not  the  best  possible.  Vining  and  Myers  use  a number  of 
different  acceptable  levels  and  suggest  that  the  experimenter  compare  the  proposed 
solutions  and  choose  the  one  they  think  is  best.  No  methodology  is  given  as  to  how 
to  compare  the  proposed  solutions,  though.  In  the  larger  is  better  case,  it  is  difficult 
to  compare  the  following  two  solutions: 

Wp  wa 
Solution  A 557.9  60 
Solution  B 687.5  75 

It  is  unclear  whether  the  larger  mean  in  solution  B sufficiently  offsets  its  increased 
standard  deviation. 

The  method  suggested  by  Lin  and  Tu  does  have  the  statistical  appeal  of 
minimizing  squared  error  loss.  For  the  target  is  best  case,  this  method  is  difficult  to 
criticize.  For  the  larger  or  smaller  is  better  case,  though,  a difficulty  arises  in  how 
to  define  bias,  since  there  is  no  explicit  target.  For  the  smaller  is  better  case,  they 
suggest  a target  of  0,  which  is  consistent  with  what  Taguchi  suggests.  This  may 
not  lead  to  a good  solution  if  the  mean  cannot  legitimately  attain  0.  If  the  mean  is 
approximately  100  and  the  standard  deviation  is  approximately  10,  then  the  bias 
term  will  dominate  the  variance  term,  meaning  the  solution  will  be  driven  by  bias, 
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with  variance  largely  ignored.  For  instance,  take  the  following  two  solutions: 

wa  MSE 

Solution  A 100  10  10100 

Solution  B 90  40  9700 

According  to  this  method,  solution  B would  be  chosen  by  virtue  of  a smaller  MSE, 
even  though  the  standard  deviation  is  four  times  that  of  solution  A.  This  probably 
would  not  be  an  acceptable  choice  of  solutions.  No  mention  is  made  of  the  larger  is 
better  situation,  as  far  as  the  definition  of  bias  is  concerned,  thus  they  do  not  even 
address  this  case.  An  experimenter  could  conceivably  choose  some  artificial  target, 
but  then  the  solution  may  depend  too  heavily  on  the  choice  of  this  target. 

A criticism  with  the  method  of  Copeland  and  Nelson  is  the  issue  of  how  much 
bias  is  acceptable.  This  is  similar  to  the  criticism  of  Vining  and  Myers  with  regard 
to  how  much  variance  is  acceptable.  As  with  Vining  and  Myers,  one  could  suggest 
a number  of  possible  solutions  and  pick  the  best  one,  but  no  method  is  given  for 
comparing  the  possible  solutions.  It  is  not  a simple  task  to  compare  a solution  that 
results  in  a bias  of  1 and  wa  = 45.20  with  a solution  that  results  in  a bias  of  5 and 
wa  = 44.73.  It  is  unclear  whether  the  lower  standard  deviation  in  the  second  case 
compensates  for  the  larger  bias. 

The  main  criticism  with  the  method  of  Kim  and  Lin  is  that  too  many 
parameters  have  to  be  specified  for  the  membership  functions.  They  require  targets, 
upper  and  lower  limits,  and  shape  parameters  to  be  specified,  all  of  which  can 
greatly  affect  the  final  solution.  It  is  this  dependence  on  so  many  parameters  that 
can  lead  to  the  belief  that  the  parameters  drive  the  solution  more  than  the  actual 
data  do.  As  will  be  shown  later,  the  final  solution  can  be  highly  dependent  on  the 
choice  of  these  parameters. 

Another  criticism  of  the  method  of  Kim  and  Lin  is  that  it  is  unnecessarily 
complicated,  and  its  solution  is  in  terms  of  a quantity  A from  equation  2.1  that  has 
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no  physical  interpretation.  If  a statistician  reports  that  the  optimal  settings  result 
in  A = 0.26,  the  engineer  will  most  likely  not  know  whether  that  is  a good  or  a bad 
result.  I believe  a simpler  approach  having  a more  intuitive  interpretation  would  be 
much  preferred. 

2.5  Multiple  Response  Optimization 

For  experimental  situations  where  there  is  more  than  one  quality  response 
of  interest,  all  of  which  depend  on  the  settings  of  a group  of  control  variables,  a 
number  of  methods  for  determining  the  optimal  set  of  control  variables  have  been 
suggested.  If  the  variance  of  the  responses  does  not  depend  on  the  settings  of  the 
control  variables,  then  a response  surface  approach  can  be  used  in  the  optimization. 
The  optimization  experiment  will  begin  by  running  an  experiment  capable  of  fitting 
a second  degree  polynomial  in  the  p control  factors  x = (aq, . . . ,xp)',  such  as  a 
central  composite  design,  face  centered  cube,  or  a 3P  factorial.  These  experiments 
are  discussed  in  books  on  response  surface  methodology,  such  as  those  by  Myers  and 
Montgomery  (1995)  or  Khuri  and  Cornell  (1996).  At  each  of  the  n design  points, 
the  r responses  are  observed  as  with  i = 1, . . . , r and  j = 1, . . . , n.  Each  of  the 
responses  is  then  individually  fit  to  a full  second  order  response  surface  model,  given 

by 

p p p 

in  — Ao  + Pijxj  + 

j= 1 j= 1 k>j 

A 

where  the  f3  are  the  usual  least  squares  estimates  of  the  true  regression  parameters. 
At  this  point,  a univariate  function,  say  (j>(x),  is  constructed,  which  is  a function 
of  the  control  variables  x through  the  fitted  responses  i/j.  The  optimal  solution 
is  the  set  of  control  factors  x that  maximizes  (or  minimizes,  depending  on  the 
function)  the  function  <p(x).  The  difference  between  the  proposed  multiple  response 
optimization  methods  lies  in  the  definition  of  the  function  <f>(x)  to  be  optimized. 


/ , / y Pijkxjxk 


26 


Derringer  and  Suich  (1980)  proposed  a desirability  function  for  the  case  of 
optimizing  several  responses  simultaneously.  For  the  ith  response,  let  the  individual 
desirability  di  be  a transformation  of  the  fitted  value  y where  0 < di  < 1.  The 
value  of  di  increases  as  the  desirability  of  the  corresponding  response  increases. 
The  overall  desirability  of  the  r responses  is  the  geometric  mean  of  the  individual 
desirabilities, 


\i=l  / 

Note  that  D is  also  between  0 and  1,  and  that  if  any  individual  di  is  0,  then  D 
will  be  0.  That  is,  if  any  individual  response  is  unacceptable,  the  entire  product  is 
unacceptable.  This  is  consistent  with  much  of  industry’s  needs  and  demands  where 
a product  must  meet  all  of  its  specifications,  or  it  will  be  rejected. 

The  individual  desirability  for  the  case  of  target  is  best,  where  the  target  for 
the  Ith  response  is  T»  and  the  lower  and  upper  specifications  are  and  y™ax , is 

given  by 


di  = < 


mm  <yt<Ti 


Ti<iii<  y\ 


max 

i 


o 


Vi  < y?m  or  Vi  > y. 


max 


where  s and  t are  shape  parameters.  Figure  2.3  shows  the  desirability  function  for 
some  choices  of  s and  t.  Typically  a shape  factor  of  1 is  used,  but  there  are  situations 
where  a value  other  than  1 may  be  used.  One  situation  would  be  if  a product  could 
be  inexpensively  reworked  if  a quality  characteristic  was  too  high,  but  would  have 
to  be  scrapped  if  it  was  too  low.  In  this  case,  if  s was  chosen  to  be  large  and  t was 
chosen  to  be  small,  the  desirability  function  would  be  larger  when  the  response  was 
above  the  target  level,  which  would  favor  the  response  being  at  or  above  target. 
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Figure  2.3:  Two  Sided  Desirability  Function  for  Selected  Shape  Parameters. 
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Other  choices  for  the  shape  parameters  could  be  thought  of  as  weighting  factors  for 
responses  that  are  more  or  less  important  than  other  responses. 

In  the  case  of  larger  is  better,  where  the  lower  specification  limit  is  y™m,  an 
upper  limit  yf10*  still  must  be  specified.  This  upper  limit  may  be  thought  of  as  the 
level  above  which  there  is  no  added  value  to  the  product.  This  upper  level  in  essence 
becomes  the  target  for  the  response.  In  the  case  of  smaller  is  better,  the  negative  of 
the  response  is  used  and  treated  as  though  it  were  a larger  is  better  situation.  The 
desirability  function  for  larger  is  better  is  given  by 


/ 


0 


Vi  < y 


min 


Vi~y 


min 


y 


max 


■y 


min 


ltmin  ^ n.  < n.max 

Vi  — Ui  — Ui 
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Ui  > y 


max 


where  s is  a shape  parameter.  Figure  2.4  shows  this  function  for  several  choices  of 
s.  Note  that  the  desirability  function  does  not  need  all  of  the  responses  to  have  the 
same  functional  form. 

The  overall  desirability  D depends  on  the  location  x in  the  region  of  interest. 
The  overall  optimum  is  found  by  maximizing  D with  respect  to  x.  This  has  an 
intuitive  appeal  that  the  optimal  choice  of  the  settings  of  the  control  variables  x 
is  the  location  that  is  most  desirable,  i.e.  has  the  highest  value  of  the  desirability 
function  D.  There  are  some  disadvantages  to  this  method  that  must  be  addressed. 
The  desirability  function  D is  treated  as  being  a function  of  only  x , but  it  is,  in 
fact,  also  a function  of  the  choices  of  the  upper  and  lower  specification  limits,  the 
targets,  and  the  shape  parameters  for  each  of  the  individual  responses.  In  addition, 
the  responses  are  treated  as  being  independent,  since  any  correlation  among  the 
responses  is  not  taken  into  account. 

Khuri  and  Conlon  (1981)  proposed  a distance  measure  from  some  ideal 
optimum,  where  the  distance  is  relative  to  the  estimated  variance  of  the  responses. 
The  main  advantage  to  this  method  is  that  it  incorporates  the  possible  correlations 
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Figure  2.4:  One  Sided  Desirability  Function  for  Selected  Shape  Parameters. 
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among  the  responses,  which  the  desirability  method  does  not.  The  first  step  in  their 
method  is  to  define  a vector  of  ideal  targets  for  the  responses  0.  The  individual 
elements  of  the  ideal  target  vector  0;  are  defined  by  the  actual  target  in  the  case 
of  target  is  best,  or  by  the  predicted  maximum  or  minimum  that  can  be  obtained 
in  the  region  of  interest  in  the  case  of  larger  or  smaller  is  better,  respectively.  If 
the  target  vector  is  an  estimate  of  the  true  target  vector,  then  it  should  be  denoted 
by  0.  Obviously,  if  there  exists  some  x that  simultaneously  achieves  all  of  the 
individual  targets  at  the  same  time,  it  is  the  ideal  optimum.  Since  this  rarely  occurs, 
the  x that  minimizes  the  vector  distance  from  this  ideal  optimum  is  defined  to  be 
the  optimal  set  of  operating  conditions. 

Before  defining  the  distance  function,  other  quantities  must  be  defined  and 
estimated.  The  linear  model  for  an  individual  response  is  written  in  the  form: 

y%  X ® •••> 

where  yi  is  the  vector  of  observations  of  the  ith  response,  /3i  is  a vector  of  q 
unknown  regression  parameters,  e*  is  a vector  of  random  errors  associated  with  the 
ith  response,  and  X is  an  nxq  full  column  rank  matrix  of  constants  obtained  from 

A 

the  experimental  design.  The  fitted  ith  response  is  of  the  form  yi(x)  = z'(x)f3 
where  z(x)  is  a vector  of  dimension  q,  whose  first  element  is  one  and  the  remaining 
elements  consist  of  powers  and  cross-products  of  control  variables  Xi,x2,...  ,xp. 
Note  that  this  method  assumes  that  all  of  the  responses  use  the  same  functional 
form.  The  matrix  of  the  observations  from  the  r responses  is  denoted  by  Y,  and 
the  variance-covariance  matrix  is  denoted  by  S.  An  unbiased  estimate  of  the 
variance-covariance  matrix  is  given  by 

S = Y'[In  - X (X1  X)-1  X']Y / (n  - q), 
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where  In  is  an  identity  matrix  of  order  nxn  (Gnanadesikan  1977,  for  instance).  The 
distance  function  is  then  defined  as 


p[y(*),0]  = 


[(y(x)  - 1(y(x)  - <f>) 

z'(x)(X'X)-1z{x) 


The  optimal  solution  is  given  by  the  set  of  the  control  factors  x which  minimizes  the 
distance  p[y(x),4>]. 

The  method  given  above  is  an  attempt  to  measure  the  distance  away  from 
target  in  terms  of  the  variability  associated  with  the  estimated  responses.  Since  the 
variance  associated  with  the  fitted  responses  Ey  is  given  by 

Ey  = z\x){X'X)-1z{x)E , 


equation  2.2  can  be  rewritten  as 


p[y(*),  </>]  = (y(*)  - (y(*)  - 4>) 


1 1/2 


(2.3) 


A A 

where  Ey  is  Ey  with  E substituted.  Equation  2.3  shows  the  relationship  to  a 
distance  measure  in  terms  of  the  variability,  just  generalized  to  a multivariate 
situation.  An  estimate  of  the  distance  of  the  fitted  responses  from  their  targets 
is  given  by  y(x)  — <p,  and  it  is  scaled  by  an  estimate  of  the  variance  of  the  fitted 

A 

responses  Ey. 

In  the  case  where  the  variance-covariance  matrix  can  be  assumed  to  be 
diagonal,  i.e.,  no  correlation  exists  among  the  responses,  the  above  calculations  can 


be  simplified  somewhat.  Equation  2.2  can  be  simplified  to 


P2[y(x),<j>]  = 


i= 1 


(&(*)  - <i>i y 


_aiiz'(x)(x'x)-1z(x)_ 


1/2 


(2.4) 


A 

where  a a is  the  ith  diagonal  element  of  E.  Khuri  and  Conlon  also  talk  about  the 


fact  that  the  target  vector  <p  may  be  a random  variable,  due  to  the  fact  that  the 
targets  may  be  estimated  from  the  experiment.  This  occurs  when  a response  is 
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maximized  or  minimized,  which  means  that  the  target  is  the  estimated  maximum  or 
minimum  that  may  be  achieved  in  the  experimental  region.  Dealing  with  a random 
target  vector  greatly  complicates  the  analysis,  and  will  not  be  dealt  with  here. 

Ames  et  al.  (1997)  propose  that  a quality  loss  function  be  used  in  order  to 
optimize  several  responses  simultaneously.  Their  method  is  somewhat  similar  to 
that  of  Khuri  and  Conlon,  though  it  can  be  thought  of  as  being  both  more  general 
and  more  restrictive.  A quality  loss  function  is  defined  by 

r 

QLP(*)  = ^2  v>i(yi(x)  - Ti )2,  (2.5) 

i=  1 

where  jji{x)  is  the  value  of  an  individual  response  evaluated  at  a given  x , T* 
is  the  target  for  the  ith  response,  and  Wi  is  a prespecified  weighting  factor.  It 
is  assumed  that  the  target  vector  is  selected  by  the  experimenter,  and  may  be 
considered  constant.  The  optimal  point  is  the  value  of  x that  minimizes  the  quality 
loss  function.  Note  that  the  loss  function  in  equation  2.5  is  of  the  same  form  as 
equation  2.4,  with  </>,  replacing  T)  as  the  target  and  the  weight  Wi  being  equal  to 
[1 / 'auz' {x)(X' X)-1  z(x)]1^ . By  not  dealing  with  the  entire  covariance  matrix, 
Ames’  method  is  more  restrictive  than  that  of  Khuri  and  Conlon.  But  by  being 
more  flexible  in  the  possible  choices  of  the  weighting  factors  Wi,  it  is  less  restrictive 
than  that  of  Khuri  and  Conlon. 

Vining  (1998)  presents  a more  general  squared  error  loss  function  that  does 
account  for  the  possible  correlation  among  the  responses  and  reduces,  in  a special 
case,  to  the  distance  measure  proposed  by  Khuri  and  Conlon.  The  loss  function  that 
he  introduces  is  of  the  form 


L = [£(*)  - 0}'C[y(x)  - 0], 

where  C is  an  appropriate  positive  definite  matrix  of  costs  or  weights  and  0 is  the 
vector  of  target  values,  assumed  to  be  known.  The  overall  optimum  is  the  location 
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of  x which  minimizes  the  expected  value  of  the  loss  function,  which  is  estimated  by 

E (L)  = [y(x)  - 9}'C[y(x)  -9}  + trace[CS^(aj)],  (2.6) 

where  'Ey(x)  is  the  variance-covariance  matrix  of  the  fitted  vector  y.  The  first  term 
of  equation  2.6  represents  the  penalty  imposed  for  any  predicted  value  that  does  not 
simultaneously  meet  all  of  the  target  values,  while  the  second  term  represents  the 
penalty  imposed  by  the  quality  of  the  prediction. 

One  serious  concern  with  this  method  is  that  the  choice  of  the  cost  matrix 
C can  be  somewhat  subjective.  Vining  offers  some  different  choices  that  the 
practitioner  may  use.  One  choice  would  be  to  use  C = [ Sy  (*)]  x,  which  would 
make  the  term 


trace[CSy(x)]  = trace[Ir] 


which  is  a constant  for  all  values  of  x.  Thus,  minimizing  E(L)  is  equivalent  to 
minimizing  the  distance  measure  suggested  by  Khuri  and  Conlon  in  equation  2.3,  as 
long  as  the  targets  9 are  known.  This  shows  that  the  distance  measure  of  Khuri 
and  Conlon  can  also  be  thought  of  as  a loss  function,  though  the  cost  matrix  that 
is  used  in  not  practical  in  an  engineering  sense.  A more  general  choice  that  is 
suggested  is  C = KT,~l,  where  K is  a suitably  chosen  matrix  that  shapes  the 
variance-covariance  structure  to  reflect  the  economics  of  the  process.  With  this 
choice,  the  loss  function  becomes 


E (L)  = [y(x)  - 9^KY~l[y{x)  - 9]  + z’(x)(X'X ) 1 z(x)tvace[K]. 


Lai  and  Chang  (1994)  propose  a fuzzy  set  approach  to  multiple  response 
optimization  that  attempts  to  simultaneously  optimize  the  individual  responses 
while  minimizing  the  variance  of  the  predicted  responses.  This  is  in  the  same  vein 
as  minimizing  the  mean  square  error  that  Vining  proposed,  in  that  it  has  a penalty 
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for  not  being  at  the  overall  optimum  and  a penalty  for  the  quality  of  the  prediction. 
A description  of  this  method  would  necessitate  the  introduction  of  a large  amount 
of  fuzzy  set  theory,  fuzzy  regression,  and  a number  of  other  fuzzy  concepts,  and  will 
not  be  discussed  further  here. 

2.6  Criticisms  of  Multiple  Response  Optimization  Methods 

As  with  the  single  response  optimization  methods,  all  of  the  multiple  response 
optimization  methods  have  shortcomings  that  need  to  be  addressed.  The  method 
of  Derringer  and  Suich  does  have  an  intuitive  appeal  with  its  desirability  function, 
but  it  is  questionable  whether  the  desirability  function  has  any  real  correlation  with 
overall  product  quality.  For  instance,  one  choice  of  shape  factors  could  give  a final 
desirability  that  is  equal  to  0.70,  while  another  choice  of  shape  factors  could  give 
a final  desirability  of  0.75.  In  this  case,  it  is  not  clear  which  of  the  two  solutions 
gives  the  better  set  of  operating  conditions.  Even  though  0.75  is  a larger  desirability 
than  0.70,  both  are  a function  of  their  shape  parameters,  so  a comparison  is  very 
difficult  to  make  and  as  a result  the  desirability  has  no  real  meaning  in  and  of  itself. 
Ignoring  the  possibility  that  the  responses  could  be  strongly  correlated  is  a far  more 
serious  problem,  as  is  the  issue  of  multiple  shape  parameters  and  targets  that  must 
be  specified  by  the  experimenter.  Treating  the  responses  as  if  they  were  independent 
could  lead  to  a suboptimal  solution,  especially  when  some  of  the  responses  are 
highly  correlated.  Similarly,  poor  choices  of  the  values  of  the  shape  parameters  and 
targets  may  lead  to  a suboptimal  solution.  In  fact,  the  solution  itself  may  depend 
heavily  on  the  choice  of  these  parameters.  This  could  lead  to  a situation  where 
the  experimental  data  is  repeatedly  reoptimized  with  different  parameters  until  the 
experimenter  gets  the  solution  that  he  desires. 

Khuri  and  Conlon’s  distance  function  does  consider  the  correlation  among  the 
responses,  but  it  is  not  without  its  own  shortcomings.  Their  treatment  of  the  target, 
especially  in  larger  (or  smaller)  is  better  settings,  can  lead  to  serious  problems. 
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In  these  situations,  they  find  the  maximum  that  each  response  can  achieve  when 
treated  by  itself  and  use  this  as  the  target.  This  ignores  the  actual  specifications 
that  a product  may  have.  For  instance,  a response  may  be  able  to  achieve  a value 
of  500,  but  the  product  may  only  need  to  be  above  100,  and  anything  over  200  may 
not  add  anything  to  the  usefulness  of  the  product.  In  this  case,  Khuri  and  Conlon’s 
method  would  still  try  to  achieve  a response  of  500,  and  this  may  make  the  solution 
entirely  dependent  on  that  one  response,  and  in  fact  it  may  drive  other  responses  far 
from  their  desired  values.  A target  of  200  would  probably  be  a more  realistic  and 
lead  to  a better  overall  solution.  A situation  where  one  response  is  driven  far  from 
its  target  because  another  response  is  trying  to  reach  an  unreasonable  maximum 
will  be  illustrated  in  a later  example. 

The  method  proposed  by  Ames  et  al.  suffers  from  the  fact  that  it  ignores 
possible  correlations  among  the  responses.  In  addition,  a weight  is  required  for  each 
response,  which  may  not  be  easy  to  define.  Possible  choices  would  be  the  reciprocal 
of  the  variance  or  a function  of  the  cost  or  importance  of  each  response.  Using  the 
reciprocal  of  the  variance  is  a special  case  of  the  Khuri  and  Conlon  method.  In  any 
case,  the  final  solution  may  depend  more  on  the  choice  of  these  weights  than  the 
actual  data  of  the  experiment. 

The  method  that  Vining  proposes  addresses  some  of  the  problems  with  Khuri 
and  Conlon’s  distance  function,  most  notably  the  issue  with  the  appropriate  targets, 
but  has  the  same  problem  as  Ames  with  respect  to  the  definition  of  a cost  matrix. 
Vining  suggests  defining  the  cost  matrix  in  a manner  similar  to  Ames,  namely  the 
inverse  of  the  variance-covariance  matrix  or  a function  of  the  costs  of  each  response. 
Though  there  may  be  times  when  the  costs  can  be  easily  defined,  it  could  become 
too  subjective  to  make  for  a truly  optimal  solution.  Vining  recommends  that  the 
targets  be  chosen  in  a manner  consistent  with  that  of  Derringer  and  Suich.  In  the 
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cases  where  a response  is  to  be  maximized  or  minimized,  the  target  should  set  at  a 
level  above  or  below  which  there  is  no  further  benefit  to  the  product. 

2.7  Robust  Multiple  Response  Optimization 
All  of  the  multiple  response  optimization  methods  described  previously 
assume  that  the  variance-covariance  among  the  responses  is  constant  in  the 
experimental  region.  There  may  exist  situations  where  this  assumption  is  not  valid, 
and  a method  for  optimizing  a multiple  response  system  where  the  variance  is  not 
constant  in  the  experimental  region  must  be  addressed.  Pignatiello  (1993)  has 
addressed  this  issue  by  suggesting  that  the  mean  square  error  be  minimized,  similar 
to  what  Vining  suggested  for  multiple  response  systems,  though  this  appears  to  be 
the  only  attempt  at  optimization  under  the  assumption  of  nonconstant  variance. 

The  optimization  experiment  will  begin  in  a manner  similar  to  that  of  simple 
multiple  response  optimization.  The  optimization  experiment  will  begin  by  running 
an  experiment  capable  of  fitting  the  assumed  model  for  the  response.  Since  an 
estimate  of  the  variance  will  be  needed  at  some  of  the  points  of  the  experiment,  some 
of  the  experimental  points  will  need  to  be  replicated.  The  part  of  the  design  that  is 
replicated  must  be  able  to  support  the  model  that  is  assumed  for  the  variance.  At 
each  of  the  n design  points,  the  r responses  are  observed  as  yij,  with  i = 1, . . . , r 
and  j = 1, . . . ,n.  At  the  nr  points  that  have  been  replicated,  an  estimate  of  the 
mean  y^  and  the  variance  sf  ■ can  be  obtained,  with  i = 1, . . . , r and  j — 1, . . . , nr. 

Pignatiello  recommends  that  all  of  the  points  of  the  experiment  be  replicated 
and  uses  a loss  function  of  the  form 

L = [y{x)  - 0]'C[y(x)  - 0], 

where  C is  an  appropriate  positive  definite  matrix  of  costs  or  weights  and  0 is  the 
vector  of  target  values.  He  only  suggests  using  this  method  when  all  of  the  responses 
are  of  the  ’target  is  best’  situation.  Though  he  does  not  deal  with  ’larger  (or  smaller) 
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is  better’,  a target  could  be  chosen  in  the  manner  recommended  by  Derringer  and 
Suich,  among  others,  namely  choose  as  the  target  the  level  above  (or  below)  which 
the  product  shows  no  overall  increase  in  value.  An  estimate  of  the  expected  loss,  or 
mean  squared  error  (MSE),  is  then  given  by 

E(L)  = [y(x)  - 0}'C[y(x)  — 6}  + trace[CE(*)], 

where  S(cc)  is  an  estimate  of  the  variance-covariance  matrix,  which  does  depend  on 
the  location  in  the  experimental  region. 

The  strategy  that  Pignatiello  recommends  to  find  the  optimal  point  is  to 
estimate  E(L)  at  each  point  in  the  experimental  design,  and  then  use  traditional 
regression  techniques  on  the  expected  loss.  He  suggests  using  the  expected  loss  as 
the  response  of  interest  in  a sequential  set  of  experiments  to  find  the  optimal  setting 
of  the  control  variables,  as  in  response  surface  methodology.  He  suggests  beginning 
with  a small  design,  such  as  a fractional  factorial,  to  find  out  which  control  variables 
need  to  increase  or  decrease  in  order  to  reduce  the  estimate  of  E(L).  He  then 
suggests  a method  of  steepest  descent,  in  an  effort  to  find  the  direction  that  most 
quickly  decreases  the  expected  loss.  Other  approaches  for  finding  the  minimum 
expected  loss  are  mentioned,  but  the  overall  idea  is  similar  in  all  of  the  cases  and  the 
only  difference  between  them  is  the  set  of  assumptions  used  to  find  the  optimum. 

There  are  several  problems  with  this  method  that  must  be  addressed.  As 
with  other  methods,  a cost  or  weighting  matrix  is  introduced  and  must  be  specified 
by  the  experimenter.  Pignatiello  does  give  some  methods  to  specify  the  individual 
elements  of  the  matrix,  but  these  can  often  be  quite  subjective.  In  many  cases, 
the  overall  choice  of  the  optimum  may  depend  too  much  on  the  choice  of  these 
subjective  parameters.  Though  he  does  not  deal  with  the  case  of  ’larger  (or  smaller) 
is  better,’  the  most  likely  way  to  deal  with  it  is  to  introduce  ’artificial’  targets  for 
these  responses.  Again  these  choices  may  be  very  subjective  and  greatly  affect 
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the  final  optimal  point.  He  does  not  go  into  much  detail  on  how  he  estimates  the 
variance-covariance  matrix,  but  it  appears  that  he  estimates  it  at  every  point  that 
is  run  in  the  experiment.  Better  strategies  may  or  may  not  exist  that  give  a better 
overall  estimate  of  the  variance-covariance  matrix,  and  could  lead  to  more  efficient 
experimental  designs. 


2.8  Optimizing  a Function 

In  all  of  the  optimization  schemes  presented  thus  far,  the  experimental  data 
are  reduced  to  a single  function  which  is  to  be  either  maximized  or  minimized.  Some 
of  the  methods  add  a secondary  constraint,  and  all  have  a constraint  on  the  values 
that  x may  take.  Typically  the  solution  cannot  be  expressed  in  a closed  form,  so 
that  some  form  of  numerical  method  must  be  used.  Several  numerical  methods  are 
available,  including  the  generalized  reduced  gradient  method  mentioned  previously. 

Del  Castillo  and  Montgomery  proposed  using  the  generalized  reduced 
gradient  (GRG)  algorithm  for  solving  the  dual  response  surface  problem  presented 
by  Vining  and  Myers.  The  GRG  algorithm  was  chosen  because  it  is  known  to  work 
well  when  optimization  is  done  on  a small  region  and  the  constraints  are  not  highly 
nonlinear,  which  is  true  of  the  dual  response  surface  problem. 

Like  many  functional  maximization  algorithms,  the  GRG  uses  a gradient  to 
find  the  direction  of  steepest  ascent,  then  moves  in  this  direction  in  order  to  increase 
the  value  of  the  function.  The  first  step  of  the  algorithm  is  to  convert  the  inequality 
constraints  into  equality  constraints  through  the  use  of  slack  variables.  For  instance, 
a constraint  of  the  form  aq  + X2  > c is  rewritten  as  aq  + X2  + xs  = c,  where  the 
slack  variable  xs  must  be  nonnegative.  Note  that  constraints  involve  more  than  one 
variable,  thus  the  bounds  on  the  individual  variables  of  the  form  0 < aq  < 1.0  are 
not  deemed  to  be  constraints. 

After  the  constraints  are  modified,  there  will  be  m constraints  and  n variables, 
some  of  which  may  be  slack  variables.  The  n variables  are  grouped  into  two  sets, 
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basic  variables  and  nonbasic  variables.  The  basic  variables  are  those  variables  that 
are  strictly  between  their  bounds,  while  those  that  are  at  their  bounds  are  nonbasic 
variables.  The  system  of  m equations  in  n variables  typically  cannot  be  solved 
explicitly,  since  m is  usually  less  than  n.  However,  the  system  can  be  solved  for  m 
of  the  variables  (basic  variables)  in  terms  of  the  remaining  n — m nonbasic  variables. 
The  gradient  of  the  objective  function  with  respect  to  the  nonbasic  variables  is  called 
the  ’’reduced  gradient.”  If  the  magnitude  of  the  reduced  gradient  at  the  current 
point  is  small,  then  the  algorithm  stops.  Otherwise,  the  gradient  is  used  to  give  the 
direction  of  steepest  ascent  (or  descent  for  minimization  problems).  A line  search  is 
performed  in  this  direction,  a new  point  found,  and  the  process  repeated. 

Nelder  and  Mead  (1965)  proposed  a direct  search  algorithm,  utilizing  a 
simplex.  Box  (1965)  adapted  the  algorithm  for  constrained  maximization.  The 
method  does  not  assume  any  smoothness,  and  thus  does  not  utilize  gradients. 
Instead  it  uses  a simplex  of  points  in  the  space  of  the  variables.  The  vertex  of 
the  simplex  with  the  lowest  response  value  is  reflected  to  the  opposite  side  of  the 
simplex,  and  by  repeating  this  the  simplex  moves  towards  the  maximum  point.  By 
use  of  expansion  and  contractions  of  the  simplex,  it  will  eventually  surround  the 
maximum  point. 

The  simplex  is  initially  defined  by  k points,  one  of  which  is  specified  by  the 
user,  and  the  rest  are  randomly  generated.  The  number  of  vertices  of  the  simplex  is 
greater  than  the  dimensionality  of  the  surface,  in  order  to  reduce  the  possibility  that 
the  simplex  collapses  to  a subspace  of  the  variables.  All  of  the  initial  points  will  be 
chosen  to  satisfy  the  constraints  of  the  problem.  The  function  is  evaluated  at  all  of 
the  vertices  of  the  simplex,  and  the  vertex  with  the  lowest  response  value  is  reflected 
through  the  centroid  of  the  remaining  points  of  the  simplex.  The  new  point  will  be 
a > 1 times  as  far  from  the  centroid  as  the  original  vertex.  If  the  new  point  still 
has  the  smallest  value,  it  is  moved  half  way  towards  the  centroid  of  the  remaining 
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points.  If  the  new  point  is  outside  of  the  bounds  of  the  variables,  it  is  moved  just 
inside  of  the  boundary.  If  the  new  point  does  not  satisfy  some  of  the  constraints,  it 
is  moved  half  way  towards  the  centroid  of  the  remaining  points. 

The  simplex  method  tends  to  be  much  slower  than  the  gradient  method,  but 
they  both  tend  to  give  the  same  result  if  the  surface  is  reasonably  smooth.  Since 
the  simplex  does  not  assume  any  smoothness,  it  will  be  less  likely  to  fail  to  find  the 
true  optimum  due  to  unusual  functional  circumstances,  such  as  the  solution  being 
in  a corner  of  the  variable  space.  Though  it  is  slower  than  most  gradient  methods, 
the  simplex  method  does  still  give  a solution  in  well  under  a minute  on  modern 
computers  (Gateway  computer  with  an  Intel  Pentium  266  MHz  processor  with  64 
MB  of  RAM)  for  the  types  of  problems  that  have  been  dealt  with  here. 

As  with  any  maximization  algorithm,  one  must  be  aware  that  the  algorithm 
may  settle  on  a local  optimum.  In  order  to  guard  against  this,  one  commonly  uses 
a number  of  different  starting  points,  and  if  they  all  find  the  same  optimum,  it  can 
be  safely  assumed  that  it  is  the  global  optimum.  For  simple  functions  in  a small 
number  of  variables,  one  may  construct  a grid  in  the  variable  space  and  evaluate  the 
function  at  every  point  in  the  grid.  Contour  plots  can  then  be  generated  to  look  at 
the  surface  of  the  function  to  verify  if  it  is  a local  or  global  optimum. 


CHAPTER  3 

ROBUST  PARAMETER  DESIGN 


3.1  A Description  of  the  Proposed  Method 
As  with  the  other  methods  that  have  been  previously  introduced,  this 
method  begins  with  a response  surface  experiment  that  will  allow  both  the  mean 
and  variance  (or  a function  of  the  variance)  to  be  fit  to  a second  order  model  in  the 
p control  variables.  The  fit  of  the  mean  of  the  response  is  given  by 
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and  model  for  the  standard  deviation  of  the  response  is  given  by 
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where  the  /3  and  7 are  the  least  squares  estimates  of  the  true  parameters.  It  is  at 
this  point  that  all  of  the  proposed  methods  attempt  something  different.  Vining  and 
Myers  proposed  that  a constrained  optimization  be  performed.  Lin  and  Tu  proposed 
that  a squared  error  loss  function  be  minimized.  Copeland  and  Nelson  introduced  a 
slightly  different  constrained  optimization  than  the  method  proposed  by  Vining  and 
Myers.  Finally,  Kim  and  Lin  took  a fuzzy  set  theory  approach  to  the  optimization. 

Taguchi’s  idea  is  that  the  variance  of  a process  should  be  kept  as  small  as 
possible,  while  keeping  the  mean  of  the  process  at  a specific  target.  If  one  allows 
for  bias,  or  the  mean  not  equaling  the  target,  this  problem  can  also  be  approached 
by  keeping  the  response  close  to  target  with  the  highest  probability,  or  maximizing 
the  probability  that  the  quality  characteristic  y is  within  its  specifications,  ymin  and 
ym ax-  Thus  the  optimal  solution  is  the  location  of  the  control  parameters  x that  will: 
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• Target  is  best:  maximize  P (ymin  < y < ymax ) 

• Larger  is  better:  maximize  P (ymin  < y) 

• Smaller  is  better:  maximize  P (y  < ymax ) 

For  simplicity,  the  probability  will  be  calculated  by  assuming  that  the  response  of 
interest  y is  normal  with  mean  wp  and  standard  deviation  wa  (which  both  depend 
on  the  control  variables  x).  As  will  be  seen  later,  the  assumption  of  normality  is  not 
a necessary  assumption,  but  does  simplify  calculations.  One  could  choose  another 
distribution  and  calculate  the  probability  in  order  to  find  the  optimal  setting. 

This  approach  is  similar  to  the  goal  of  a high  Cpk,  which  is  becoming  popular 
in  industry.  A thorough  treatment  of  capability  indices  can  be  found  in  Kotz  and 
Johnson  (1993),  and  a discussion  of  issues  surrounding  the  use  of  capability  indices 
can  be  found  in  Rodriguez  (1992).  The  index  Cpk  is  a measure  of  how  far  the  mean 
y of  a distribution  is  from  its  closest  specification  limit,  either  ymin  or  ymax ■ The 
distance  is  measured  in  terms  of  the  standard  deviation  of  the  distribution,  given 
by  the  usual  unbiased  estimate  s.  The  idea  of  Cpk  is  to  have  a measure  of  the 
percentage  of  product  that  is  failing  that  is  easy  to  calculate.  The  equation  for  Cpk 
is  usually  given  by  the  minimum  of  the  lower  and  upper  capability  indices,  or 
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If  one  assumes  normality,  then  the  lower  and  upper  capability  indices  are  simply 
multiples  of  a standard  normal  2 statistic,  and  the  probability  of  being  out  of 
specification  is  easily  calculated  by  P (z  > 3 Cpi)  + P(z  > 3Cpu).  In  many  cases, 
one  of  these  two  probabilities  is  very  nearly  0,  so  the  probability  of  being  out  of 
specification  is  very  nearly  P(z  > 3 Cpk)-  Cpk  is  not,  however,  a meaningful  measure 
if  the  assumption  of  normality  does  not  hold.  The  assumption  of  normality  is 
what  gives  the  Cpk  the  close  relationship  with  the  probability  of  being  outside  the 
specification  limits. 
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The  proposed  probability  method  has  intuitive  appeal,  because  most  people 
will  easily  understand  the  problem  being  solved,  that  of  measuring  the  probability 
that  the  process  is  within  specifications.  This  is  in  direct  contrast  to  the  Kim 
and  Lin  method,  in  which  A,  the  value  of  a membership  function,  has  no  direct 
interpretation.  The  proposed  method  does  simultaneously  minimize  variation  while 
staying  close  to  target,  and  uses  the  same  basic  methodology  for  all  three  cases 
listed,  unlike  any  of  the  previously  proposed  methods.  It  also  answers  all  of  the 
issues  presented  by  the  other  proposed  methods,  while  introducing  few  new  issues. 

One  issue  that  other  methods  have  addressed  is  that  of  allowing  some  bias 
into  the  final  solution,  or  allowing  the  final  solution  to  have  the  estimated  mean 
Wfj,  not  equal  to  its  target.  This  method  does  allow  for  some  bias  in  the  optimal 
solution,  but  it  does  penalize  for  too  much  bias.  As  the  bias  gets  larger,  the  mean 
is  moving  closer  to  the  maximum  or  minimum  specification  limits,  thus  decreasing 
the  probability  that  is  to  be  maximized.  Thus  the  bias  will  want  to  be  kept  small. 
If  the  amount  of  bias  is  a critical  concern,  the  maximum  and  minimum  specification 
limits  can  be  made  very  close  to  each  other.  This  would  be  akin  to  making  A small 
in  the  method  of  Copeland  and  Nelson. 

An  issue  that  has  not  been  addressed  by  other  methods  is  that  of  the 
constraint  that  the  variance  be  kept  at  a specific  level,  which  is  done  in  the 
constrained  optimization  method  of  Vining  and  Myers,  as  well  as  that  proposed  by 
Copeland  and  Nelson.  In  the  case  of  larger  (or  smaller)  is  better,  these  two  methods 
recommend  that  the  secondary  constraint  be  that  the  variance  be  set  at  or  below 
some  acceptable  value  a$.  The  choice  of  an  acceptable  amount  of  variance  can 
be  quite  subjective.  One  recommendation  is  to  use  a number  of  different  choices 
of  <Tq,  and  then  pick  the  one  that  gives  the  best  balance  between  the  mean  and 
variance.  No  mechanism  is  given  to  compare  these  competing  solutions,  however. 
The  proposed  probability  method  is  not  forced  to  use  this  constraint,  since  the  goal 
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is  to  balance  the  mean  and  the  variance  simultaneously,  via  a probability.  Also  in 
the  case  of  larger  (or  smaller)  is  better,  there  is  no  need  to  introduce  an  artificial 
definition  of  bias.  The  squared  error  loss  method  of  Lin  and  Tu  minimizes  a measure 
that  has  a variance  term  and  a bias  term.  The  bias  term  is  defined  as  the  distance 
that  the  mean  is  from  its  target,  but  there  is  no  clear  target  when  the  response  is 
to  maximized  (or  minimized).  In  these  situations,  an  artificial  target  is  selected 
anyway,  though  its  choice  can  be  quite  subjective  and  greatly  influence  the  final 
solution. 

Though  the  probability  method  must  specify  a minimum  and  maximum  limit 
for  the  response,  there  are  far  fewer  parameters  introduced  than  the  method  of  Kim 
and  Lin.  Kim  and  Lin  need  the  experimenter  to  define  shape  parameters  for  both 
the  mean  and  the  variance.  If  one  assumes  normality,  the  choice  of  the  minimum 
and  maximum  will  frequently  not  affect  the  choice  of  the  optimal  setting,  as  long 
as  the  maximum  and  minimum  are  symmetric  about  the  target.  As  can  be  seen  in 
Figure  3.1,  a solution  that  maximizes  P (yl  < y < yu)  will  be  almost  identical  to 
a solution  that  maximizes  P (yl*  < y < yu*).  The  Kim  and  Lin  method  gives  a 
solution  that  can  be  very  dependent  on  the  shape  parameters  that  must  be  specified, 
as  will  be  shown  in  a later  example. 

The  use  of  the  shape  parameters  in  the  Kim  and  Lin  method  seems  to  be 
motivated  by  a situation  where  being  out  of  specification  on  one  side  is  less  costly 
than  the  other.  For  instance,  it  may  be  relatively  inexpensive  to  repair  a part  that 
is  too  large,  while  a part  that  is  too  small  may  have  to  be  scrapped  altogether.  The 
proposed  methodology  is  easily  modified  for  such  an  economic  situation.  Instead  of 
maximizing  the  probability  of  being  within  specification  limits,  one  could  minimize 
the  expected  cost  of  failing  material,  given  by  C\P(y  < ymin ) + C2-P(y  > ymax),  where 
ci  and  C2  are  the  costs  of  failing  low  and  high,  respectively.  Though  this  would  add 
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Figure  3.1:  Optimal  Solution  and  Two  Sets  of  Limits. 
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two  extra  parameters,  the  cost  parameters  could  be  much  easier  for  an  experimenter 
to  specify  than  the  shape  parameters  necessary  for  the  method  of  Kim  and  Lin. 

Thus  the  probability  method  seems  to  answer  all  of  the  issues  presented  in 
previous  approaches  to  the  problem.  Issues  that  may  arise  from  this  method  would 
be  the  assumption  of  normality,  no  direct  mention  of  a target  value  in  the  solution, 
and  the  usual  issue  of  how  much  bias  may  enter  the  solution.  As  for  the  assumption 
of  normality,  it  is  not  critical  and  is  only  used  for  simplicity.  One  could  use  any 
distributional  assumptions,  as  long  as  the  probability  is  maximized  accordingly,  as 
will  be  shown  in  an  example.  The  target  does  enter  the  solution  as  being  half  way 
between  the  maximum  and  minimum  limits,  though  this  is  not  explicit.  This  does 
not  seem  to  be  a major  issue,  since  the  minimum  and  maximum  may  be  made  very 
close,  which  will  force  the  optimal  setting  near  the  target.  This  will  also  keep  the 
bias  within  some  bounds,  if  that  is  critical. 

Two  situations  may  cause  this  method  to  have  troubles.  The  first  is  if  the 
mean  response  is  unable  to  fall  within  the  upper  and  lower  limits,  which  will  cause 
this  method  to  try  to  find  an  area  with  a very  large  variance.  Figure  3.2  shows 
two  possible  distributions  in  a larger  is  better  situation.  It  can  be  easily  seen  that 
P(V  > yl)  is  smaller  for  distribution  1 than  it  is  for  distribution  2.  The  probability 
method  would  choose  distribution  2 over  distribution  1,  despite  the  fact  that 
distribution  2 has  a lower  mean  and  a larger  variance,  neither  of  which  is  consistent 
with  a situation  where  the  response  is  to  be  maximized.  In  this  situation,  a small 
variance  would  cause  the  process  to  be  consistently  outside  of  the  specification 
window.  A large  variance,  on  the  other  hand,  would  get  at  least  some  of  the  product 
within  specification  some  of  the  time.  Either  way,  this  would  not  be  a desirable 
process  for  most  people.  The  second  situation  that  could  cause  problems  would  be  if 
the  specification  window  was  very  wide  or  narrow.  In  this  situation,  the  probability 
of  being  within  specification  would  be  almost  identical  (and  almost  always  either  0 
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Figure  3.2:  Two  Possible  Solutions  for  Larger  is  Better. 

or  1)  for  the  entire  experimental  region.  This  would  cause  the  optimization  routine 
to  become  very  unstable,  and  would  likely  return  different  locations  for  the  optimum 
for  different  starting  values,  though  all  locations  would  have  essentially  that  same 
probability  of  being  within  specification.  It  would  probably  be  best  to  modify  the 
specification  limits  if  this  should  arise,  so  that  the  probability  being  maximized  is 
not  so  close  to  0 or  1. 


3.2  Printing  Process  Example 

The  example  will  be  taken  from  Box  and  Draper  (1987),  which  also  appears 
in  Vining  and  Myers,  Lin  and  Tu,  and  Kim  and  Lin.  The  purpose  of  the  experiment 
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was  to  determine  the  effect  of  speed  (xi),  pressure  (X2),  and  distance  (x3)  on  the 
quality  of  a printing  process.  The  experiment  was  conducted  in  a 33  factorial  design 
with  three  replicates  at  each  design  point.  The  data  set  with  the  estimated  responses 
is  in  Table  3.1,  and  the  summary  statistics  of  the  fit  of  the  two  models  in  Table  3.2. 
A second  order  model  was  fit  for  both  the  mean  and  standard  deviation  as  follows 
(Vining  and  Myers,  1990): 


w „ =327.6  + 177-Oxi  + 109.4x2  + 131.5x3  + 32.0x? 

— 22.4x2  ~ 29.IX3  + 66.0xix2  + 75.5xix2  + 43.6x2x3  (3.1) 

wa  =34.9  + 11.5xi  + 15.3x2  + 29.2x3  + 4.2x2 

— 1.3x2  + I6.8X3  + 7.7xix2  + 5.1xiX3  + 14.1x2x3  (3.2) 

3.2.1  Case  1:  Target  is  Best 

In  order  to  compare  all  of  the  different  methods,  the  bounds  that  Kim  and 
Lin  proposed  will  be  used,  namely  ie“in  = 490,  ie™ax  = 510,  no  — 500,  u;“in  = 
x/1500  = 38.73,  and  w ““  = \/2100  = 45.83.  The  shape  parameters  that  will  be 
used  are  d M = -4.39,  -1.70,0, 1.70,4.39,  and  da  = 0.  The  region  of  interest  will  be 
a sphere  of  radius  1.  Table  3.3  gives  the  optimal  settings  from  all  of  the  methods 
mentioned,  where  the  traditional  method  is  that  of  Vining  and  Myers  combined  with 
Del  Castillo  and  Montgomery.  The  final  column  is  the  probability  that  the  response 
will  be  between  the  upper  and  lower  bounds  given  above,  assuming  normality  with 
mean  and  standard  deviation  wa.  This  is  calculated  as 

/>510 

P(490  < y < 510)  = / (27rw^)”1/2exp(-(y  - w^)2 /(2wl))dy. 

J 490 

It  should  be  noted  that  these  solutions  were  arrived  at  using  the  solver 
command  in  Microsoft  Excel,  version  7.0.  Solver  is  a generalized  reduced  gradient 


u 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 
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Table  3.1:  Printing  Study  Data. 


Xi 

Xz 

Uul 

Uu  2 

Uu3 

-1 

-l 

-1 

34 

10 

28 

0 

-l 

-1 

115 

116 

130 

1 

-l 

-1 

192 

186 

263 

-1 

0 

-1 

82 

88 

88 

0 

0 

-1 

44 

178 

188 

1 

0 

-1 

322 

350 

350 

-1 

1 

-1 

141 

110 

86 

0 

1 

-1 

259 

251 

259 

1 

1 

-1 

290 

280 

245 

-1 

-1 

0 

81 

81 

81 

0 

-1 

0 

90 

122 

93 

1 

-1 

0 

319 

376 

376 

-1 

0 

0 

180 

180 

154 

0 

0 

0 

372 

372 

372 

1 

0 

0 

541 

568 

396 

-1 

1 

0 

288 

192 

312 

0 

1 

0 

432 

336 

513 

1 

1 

0 

713 

725 

754 

-1 

-1 

1 

364 

99 

199 

0 

-1 

1 

232 

221 

266 

1 

-1 

1 

408 

415 

443 

-1 

0 

1 

182 

233 

182 

0 

0 

1 

507 

515 

434 

1 

0 

1 

846 

535 

640 

-1 

1 

1 

236 

126 

168 

0 

1 

1 

660 

440 

403 

1 

1 

1 

878 

991 

1161 

Vu 

w( J 

24.0 

12.5 

75.4 

25.4 

120.3 

8.4 

78.9 

19.9 

213.7 

42.8 

146.4 

22.8 

86.0 

3.7 

97.6 

20.3 

136.7 

80.4 

167.1 

22.5 

340.7 

16.2 

300.6 

33.1 

112.3 

27.6 

75.0 

12.5 

256.3 

4.6 

210.6 

22.4 

271.7 

23.6 

410.1 

40.7 

81.0 

0.0 

116.8 

18.6 

101.7 

17.7 

195.8 

18.2 

357.0 

32.9 

338.9 

26.3 

171.3 

15.0 

182.6 

27.6 

372.0 

0.0 

327.6 

34.9 

501.7 

92.5 

536.6 

50.6 

264.0 

63.5 

203.6 

33.8 

427.0 

88.6 

414.7 

48.9 

730.7 

21.1 

689.7 

72.3 

220.7 

133.8 

100.2 

45.4 

239.7 

23.5 

254.6 

50.1 

422.0 

18.5 

473.1 

63.3 

199.0 

29.4 

209.6 

68.4 

485.3 

44.6 

430.1 

80.9 

673.7 

158.2 

714.5 

101.7 

176.7 

55.5 

274.2 

88.8 

501.0 

138.9 

560.7 

108.9 

1010.0 

142.5 

911.2 

137.5 

Table  3.2:  Summary  of  Fits  of  the  Two  Responses. 


I 

r 


R2 

RlM 


A 


0.927 

0.888 


wa 

0.454 

0.165 
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Table  3.3:  Target  is  Best,  Limits  of  490  and  510. 


Method 

Optimal  Setting 

Wa 

Prob 

Proposed 

(0.983,  0.003,  -0.182) 

494.61 

44.66 

0.1759 

Traditional 

(0.984,  0.025,  -0.175) 

500.00 

45.32 

0.1746 

Lin-Tu 

(0.983,  0.002,  -0.182) 

494.53 

44.65 

0.1759 

Copeland-Nelson 

A=5 

(0.983,  0.004,  -0.182) 

495.00 

44.71 

0.1759 

A=1 

(0.984,  0.021,  -0.177) 

499.00 

44.20 

0.1751 

Kim-Lin 
dfj,= 4.39 

(0.975,  -0.005,  -0.187) 

490.57 

44.23 

0.1749 

dM= 1.70 

(0.983,  -0.014,  -0.185) 

491.19 

44.24 

0.1754 

d^= 0 

(0.977,  0.003,  -0.184) 

492.02 

44.39 

0.1754 

dp=- 1.70 

(0.983,  -0.002,  -0.183) 

493.52 

44.53 

0.1759 

4.39 

(0.979,  0.042,  -0.202) 

495.74 

44.81 

0.1758 

algorithm  for  finding  a constrained  maximum.  The  details  on  the  use  of  the  solver 
command  can  be  found  in  the  Appendix  A.  It  should  be  noted  that  with  any 
maximization  or  minimization  algorithm,  there  is  the  chance  that  a local,  rather 
than  global,  optimum  will  be  found.  A common  way  to  guard  against  this  is  to  use 
a number  of  different  starting  points.  If  all  of  the  different  starting  points  end  up  at 
the  same  optimum,  it  may  be  safely  assumed  that  it  is  the  true  global  optimum.  It 
may  also  be  possible  to  evaluate  the  function  at  every  point  of  a rough  grid  of  the 
experimental  region.  The  value  at  the  optimum  that  has  been  found  can  then  be 
compared  to  the  values  over  the  rough  grid  to  determine  whether  it  is  likely  that  it 
is  a true  global  optimum. 

As  Table  3.3  shows,  there  is  very  little  real  difference  among  most  of  the 
solutions.  The  proposed  method  is  essentially  identical  to  the  method  of  Lin  and  Tu. 
When  one  is  confronted  with  many  similar  solutions,  the  simplest  and  most  robust  is 
most  desirable.  In  order  to  check  for  robustness,  the  above  is  rerun  with  new  limits 
of  w™in  = 450,  rc“ax  = 550  (approximately  ±1  standard  deviation),  w ““  = 50,  and 
all  other  limits  the  same.  Note  that  Kim  and  Lin  is  the  only  other  method  that  uses 
these  limits,  so  the  location  of  the  other  solutions  will  stay  the  same,  although  the 
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Table  3.4:  Target  is  Best,  Limits  of  450  and  550. 


Method 

Optimal  Setting 

Wa 

Prob 

Proposed 

(0.983,  0.003,  -0.183) 

494.58 

44.66 

0.7336 

Traditional 

(0.984,  0.025,  -0.175) 

500.00 

45.32 

0.7300 

Lin-Tu 

(0.983,  0.002,  -0.182) 

494.53 

44.65 

0.7336 

Copeland-Nelson 

A=5 

(0.983,  0.004,  -0.182) 

495.00 

44.71 

0.7336 

A=1 

(0.984,  0.021,  -0.177) 

499.00 

45.20 

0.7312 

Kim-Lin 

gL=4.39 

(0.937,  -0.047,  -0.232) 

465.57 

41.50 

0.6253 

a- 

■ 

II 

H- 1 
• 

O 

(0.948,  -0.034,  -0.215) 

473.66 

42.38 

0.6759 

d»= 0 

(0.979,  -0.057,  -0.195) 

480.97 

43.02 

0.7099 

^=-1.70 

(0.968,  -0.013,  -0.191) 

486.53 

43.80 

0.7243 

dfj,=- 4.39 

(0.978,  -0.003,  -0.184) 

492.16 

44.41 

0.7324 

calculation  of  the  probability  will  change,  due  to  the  new  limits.  Table  3.4  gives  the 
results  for  all  of  the  methods. 

As  can  be  seen  in  Table  3.4,  the  method  of  Kim  and  Lin  has  some  serious 
problems  with  bias.  While  the  other  methods  have  a bias  in  the  range  of  5 or  less, 
Kim  and  Lin’s  method  has  biases  ranging  from  8 to  34,  despite  their  claim  that 
their  method  will  limit  the  bias.  Note  that  the  bias  of  34.43  is  approaching  one 
full  standard  deviation  (41.50)  from  target.  One  of  the  concerns  of  the  Kim  and 
Lin  method  was  that  the  final  solution  would  depend  too  much  on  the  choice  of 
the  shape  parameters,  as  well  as  the  choice  of  upper  and  lower  limits.  As  Table 
3.4  shows,  there  is  a great  deal  of  difference  between  the  solutions  with  shape 
parameters  of  +4.39  and  -4.39.  The  predicted  mean  can  be  as  close  as  8 from  target 
or  as  far  as  34.5  from  target.  In  addition,  the  solutions  are  quite  different  when  the 
upper  and  lower  limits  are  changed,  as  can  be  seen  when  comparing  Table  3.3  with 
Table  3.4.  Fixing  the  shape  parameter  at  0,  when  the  limits  are  490  and  510,  the 
estimated  mean  w ^ is  492,  but  it  is  481  when  the  limits  are  changed  to  450  and  550. 
The  proposed  probability  method,  as  expected,  remains  essentially  the  same  when 
the  limits  are  changed.  With  the  tighter  limits,  the  estimated  mean  w M is  494.61, 
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Table  3.5:  Larger  is  Better,  Lower  Limit  of  550. 


Method 

Optimal  Setting 

A 

A 

a 

Prob 

Proposed 

Traditional 

(0.818,  0.415,  0.399) 
(0.946,  0.312,  0.088) 

637.35 

594.02 

74.17 

60.00 

0.8806 

0.7684 

and  it  only  changes  to  494.58  when  the  limits  are  widened  to  450  and  550.  The 
standard  deviation  wa  does  not  change  when  the  limits  are  changed.  This  shows 
that  the  proposed  probability  method  is  insensitive  to  the  choices  of  the  upper  and 
lower  limits. 

3.2.2  Case  2:  Larger  is  Better 

Vining  and  Myers  originally  looked  at  the  case  of  maximizing  the  mean 
while  restricting  the  standard  deviation  to  60.  Del  Castillo  and  Montgomery  also 
analyzed  this  situation,  as  did  Copeland  and  Nelson,  whose  restriction  was  keeping 
the  standard  deviation  less  than  or  equal  to  60.  These  methods  will  be  compared 
to  the  proposed  method  of  maximizing  the  probability  that  the  response  is  greater 
than  550 

poo 

P(550  <y)=  (27 Twl)~1/2exp{-(y  - /(2wl))dy. 

J 550 

The  maximization  will  be  over  a spherical  region  of  radius  1.  The  results  are  in 
Table  3.5. 

By  not  having  a restriction  on  the  standard  deviation,  the  proposed  method 
comes  up  with  a much  higher  response  value,  with  only  a moderately  higher  value 
of  the  standard  deviation,  which  results  in  a much  higher  probability  of  success.  Of 
course  one  could  always  change  the  restriction  in  the  traditional  method  to  keeping 
the  standard  deviation  less  than  75,  and  one  would  expect  a solution  similar  to  the 
proposed  method.  One  advantage  that  the  proposed  method  has  over  the  other 
methods  is  the  ability  to  compare  two  potential  solutions  such  as  those  above. 
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Table  3.6:  Larger  is  Better,  Comparing  Different  Lower  Limits. 


Lower  Limit 

Optimal  Setting 

Wfj, 

wa 

Prob  > 550 

500 

(0.847,  0.404,  0.346) 

633.81 

71.84 

0.8783 

550 

(0.818,  0.415,  0.399) 

637.35 

74.17 

0.8806 

600 

(0.789,  0.423,  0.445) 

639.07 

76.18 

0.8788 

650 

(0.762,  0.429,  0.485) 

639.40 

77.88 

0.8745 

Indeed  Vining  and  Myers  solve  the  problem  for  standard  deviations  of  60,  75,  and 
90,  but  no  methodology  is  given  to  compare  the  three  competing  solutions.  Their 
only  advise  is  that  “the  user  must  decide  which  of  these  settings  offers  the  best 
compromise.”  Previous  methods  have  had  to  assume  an  upper  limit  on  the  standard 
deviation,  but  no  method  has  been  given  to  choose  an  appropriate  upper  limit. 

Note  that  the  results  of  Copeland  and  Nelson  were  almost  identical  to  those 
of  the  Vining  and  Myers  method.  The  method  of  Lin  and  Tu  does  not  address 
the  case  of  larger  is  better.  The  method  of  Kim  and  Lin  was  not  looked  at  due  to 
the  many  extra  parameters  that  would  need  to  be  specified,  as  well  as  its  inferior 
performance  in  the  case  of  target  is  best. 

In  order  to  assess  how  much  the  final  solution  depends  on  the  lower  limit  that 
is  used,  I found  the  optimal  point  for  lower  limits  of  500,  600,  and  650,  which  are 
displayed  in  Table  3.6.  In  order  to  compare  the  different  solutions,  the  probability 
that  the  response  is  greater  than  550  is  calculated  for  all  of  the  cases 

roc 

P(550  <y)=  (27 Twl)~1/2exp(-(y  - w^f  /(2w2a))dy. 

J 550 

Although  the  x locations  change  by  a small  amount,  the  probability  of  being  above 
550  does  not  change  too  much,  which  shows  that  this  method  appears  to  be  robust 
to  the  choice  of  the  lower  limit. 
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Table  3.7:  Smaller  is  Better,  Upper  Limit  of  150. 


Method 

Optimal  Setting 

A 

A 

a 

Prob 

Proposed 

(-0.399,  -0.451,  -0.798) 

136.34 

19.25 

0.7611 

Lin-Tu 

(-0.392,  -0.421,  -0.818) 

136.26 

19.44 

0.7602 

Copeland-Nelson 

(-0.391,  -0.414,  -0.822) 

136.25 

19.48 

0.7598 

3.2.3  Case  3:  Smaller  is  Better 

Copeland  and  Nelson  looked  at  the  case  of  smaller  is  better,  as  did  Lin  and 
Tu.  These  two  methods  will  come  up  with  nearly  identical  results  when  optimization 
is  over  a sphere  of  radius  1.  The  proposed  method  maximizes  the  probability  that 
the  response  is  less  than  150 

/150 

(27 Twl)-1/2exp(-(y  - wtf/{2wl))dy. 

-OO 

All  three  come  up  with  nearly  identical  solutions,  as  given  in  Table  3.7. 

The  traditional  Vining  and  Myers  method  will  be  identical  to  the  Copeland 
and  Nelson  solution.  Again  the  Kim  and  Lin  method  was  not  considered  for  the 
same  reasons  as  in  the  larger  is  better  situation. 

3.3  Example  with  Lifetime  Data 

This  example  is  simulated  for  a hypothetical  reliability  experiment.  The 
experiment  has  four  equally  spaced  levels  of  a control  variable,  x,  which  are  coded 
— 1,  — and  1,  and  we  are  interested  in  the  time  until  failure,  t,  measured  in 

o o 

hundreds  of  hours.  The  failure  time  data  is  randomly  generated  from  a Weibull 
distribution,  whose  density  is  given  by 

f(t)=a(3{at)l3-1e-{at)0. 

The  true  a and  /3  depend  on  x according  to  the  following  equations: 

a = 4.6992  - 10.88a:  + 8a:2 


55 


Table  3.8:  Time  Until  Failure  Data  (Time  in  Hundreds  of  Hours). 


X 

-i 

j 

3 

1 

3 

1 

True  a 

23.58 

9.21 

1.96 

1.82 

True  P 

0.576 

0.649 

0.540 

0.248 

Sample 

0.1283,  0.4431 

0.1082,  0.0239 

0.1020,  0.3425 

0.6462,  3.5592 

0.0040,  0.2122 

0.0068,  0.1066 

0.2261,  3.5968 

0.3835,  6.5698 

0.0886,  0.2198 

0.4795,  0.5322 

0.2177,  0.0683 

3.6837,  0.0962 

2.6648,  0.0286 

0.1203,  1.4320 

2.5847,  1.8744 

0.0007,  0.0050 

0.0040,  0.0563 

0.0203,  0.3836 

9.2947,  0.2816 

0.0847,  0.0003 

/s 

a 

4.6771 

3.8210 

0.7505 

1.8988 

— — 

p 

0.5603 

0.7259 

0.6552 

0.3736 

A 

V 

0.3850 

0.3213 

1.8589 

1.5029 

A 

a 

0.8122 

0.4368 

2.8921 

2.2947 

P = 0.6172  - 0.164a;  - 0.205a;2. 

At  each  level  of  x,  a random  sample  of  size  10  is  generated  from  a Weibull 
distribution,  with  the  appropriate  a and  ft.  Table  3.8  lists  the  data  for  each  level  of 
x.  The  table  lists  the  true  values  of  a and  ft  that  were  used  to  generate  the  sample, 

A 

as  well  as  the  maximum  likelihood  estimates,  a and  P,  and  the  sample  mean,  ft,  and 
standard  deviation,  a.  Note  that  the  maximum  likelihood  estimates  of  the  Weibull 
parameters,  ot  and  P must  be  found  using  a numerical  procedure,  since  no  closed 
form  exists  for  these.  The  details  can  be  found  in  any  book  on  survival  analysis  or 
reliability,  such  as  Lawless  (1982). 

The  purpose  of  this  analysis  is  to  show  that  the  proposed  method  is  flexible 
enough  to  handle  a distribution  that  is  not  normal.  Furthermore,  it  will  show  that 
one  could  get  a solution  that  would  be  far  from  optimal,  if  normality  is  assumed. 
Four  analyses  are  given  in  Table  3.9,  they  are: 

• Truth:  This  is  the  location  of  x in  [-1, 1]  that  maximizes  the  probability  that 
the  time  until  failure  is  greater  than  0.1,  P (t  > 0.1).  This  is  done  by  the 
maximization  algorithm  of  Nelder  and  Mead  that  has  been  discussed  previously. 
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Table  3.9:  Optimization  Information  for  Different  Distributional  Assumptions. 


Distribution 

Parameter  Estimates 

R2 

Optimal  x 

P(t  > 0.1) 

Truth 

0.5510 

0.6952 

Weibull 

d = 2.161  - 1.711a;  + 1.127a:2 
$ = 0.719  - 0.095a;  - 0.252a;2 

0.7839 

0.9995 

0.2819 

0.6446 

Normal 

p=  1.108  + 0.734a; -0.164a;2 
a = 1.678  + 1.035a;  - 0.125a;2 

0.6660 

0.5805 

1.0000 

0.5194 

Lognormal 

p = —1.189  + 0.346a;  - 1.178a;2 
a = 1.534  + 0.709a;  + 1.273a;2 

0.5688 

0.9456 

0.0278 

0.5460 

A 

• Weibull:  The  estimated  parameters,  a and  /?,  are  separately  fit  to  second  order 
models  in  x.  It  is  then  possible  to  calculate  an  estimate  of  the  probability  that 
the  time  until  failure  is  greater  than  0.1,  assuming  that  t is  Weibull  distributed 

A 

with  parameters  a(x)  and  f3(x) 

/»00 

P (t  > 0.1)  = / d(a;)^(x)(d(a:)t)^:r)-1e-^(a:)^<l)dt. 

J 0.1 

We  then  find  the  location  of  £ in  [-1, 1]  that  maximizes  the  estimate  of  P(t  > 
0.1).  Table  3.9  gives  the  individual  fitted  equations,  as  well  as  the  R2  of  the 
fit,  the  location  of  the  optimal  x,  and  the  true  P (t  > 0.1)  for  that  x.  Note  that 
since  we  know  that  the  data  were  generated  from  a Weibull  distribution  with 
the  functional  form  of  a and  /?  given  above,  we  are  able  to  calculate  the  true 
P (t  > 0.1). 

• Normal:  The  estimated  parameters,  p and  d,  are  separately  fit  to  second  order 
models  in  x.  It  is  then  possible  to  calculate  an  estimate  of  the  probability  that 
time  until  failure  is  greater  than  0.1,  assuming  that  t is  normally  distributed 
with  parameters  p(x)  and  a(x) 

poo 

P (t  > 0.1)  = / (27T d2(x))~ll2exp(-(t  - p(x))2/(2 a2(x)))dt. 

J o.i 
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We  then  find  the  location  of  £ in  [-1, 1]  that  maximizes  the  estimate  of  P(t  > 
0.1).  Table  3.9  again  gives  the  pertinent  information.  Note  that  the  probability 
given  is  the  true  probability  for  the  x given. 

• Lognormal:  In  many  situations  when  heavily  skewed  data  is  encountered,  the 
experimenter  will  try  to  transform  the  data  to  make  it  more  normal  and  then 
analyze  the  transformed  data  as  though  it  were  normal.  In  this  case,  a logarithm 
transformation  would  be  a good  choice.  For  the  lognormal  line  of  Table  3.9,  all 
of  the  original  times  to  failure  are  transformed,  then  everything  procedes  as  with 
the  normal  case  above.  The  means  and  standard  deviations  of  the  transformed 
data  are  not  given  in  Table  3.8,  but  they  are  used  to  fit  second  order  equations 
in  x,  and  thus  find  the  optimal  x as  in  the  normal  case  above.  Table  3.9  again 
gives  the  pertinent  information,  and  the  probability  is  the  true  probability  for 
that  x. 

As  the  table  shows,  it  is  possible  to  find  an  optimal  setting  for  x and  not 
have  to  rely  on  the  assumption  of  normality.  Furthermore,  the  solution  assuming 
normality  is  far  from  optimal,  even  suggesting  that  the  optimal  setting  is  not  within 
the  experimental  region.  Taking  the  logarithm  transformation  does  not  seem  to  help 
very  much  in  this  situation,  though  it  is  an  improvement  over  the  assumption  of 
normality.  The  Weibull  solution  is  better  than  the  normal  or  lognormal  solution,  but 
it  is  a fair  distance  from  the  true  optimum  solution.  I believe  that  this  is  due  more 
to  a lack  of  repeatability  with  any  optimization  situation  rather  than  something  that 
is  unique  to  this  situation. 

It  should  be  noted  that  I had  a very  difficult  time  coming  up  with  data  that 
would  show  a difference  in  optima  when  assuming  a normal  distribution  rather  than 
a Weibull  distribution.  Initially  I thought  that  I could  use  any  randomly  generated 
set  of  data,  but  I instead  found  that  the  data  had  to  be  very  heavily  skewed  before 
any  difference  could  be  seen.  Perhaps  I could  have  more  easily  arrived  at  a suitable 
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set  of  data,  but  it  seemed  to  me  that  this  method  is  robust  to  the  assumption 
of  normality.  In  any  case,  the  practitioner  would  still  want  to  use  the  correct 
distributional  assumption  in  order  to  get  the  proper  estimate  of  the  probability  of 
being  acceptable. 


CHAPTER  4 

MULTIPLE  RESPONSE  OPTIMIZATION 
4.1  A Description  of  the  Proposed  Method 
As  with  the  other  methods  that  have  been  proposed  for  multiple  response 
optimization,  this  method  begins  with  an  experiment  capable  of  fitting  a second 
degree  polynomial  in  the  p control  factors  x = (aq, . . . ,xp)',  such  as  a central 
composite  design,  face  centered  cube,  or  a 3P  factorial.  At  each  of  the  n design 
points,  the  r responses  are  observed  as  t/jj,  with  i — 1, . . . , r and  j = 1, . . . , n.  Each 
of  the  responses  is  then  individually  fit  to  a full  second  order  response  surface  model, 
given  by 

p p p 

ill  — PiO  T ^ ] PijXj  + ^ ^ 'y  ^ Pijk%j%ki 
j= 1 j- 1 k>j 

where  the  ft  are  the  least  squares  estimates  of  the  true  regression  parameters.  This 
can  also  be  written  as 

Vi{x)  X Pi,  i 1, ...,  r, 

where  X is  the  design  matrix  of  control  variables,  their  powers,  and  their 

A i 1 

cross-products,  and  is  a vector  of  the  least  squares  estimate  of  the  r set 
of  regression  coefficients.  An  unbiased  estimator  (Gnanadesikan  1977)  of  the 
variance-covariance  matrix  is  given  by 

t = Y'[In  - X{X'X)-1X'}Yl{n  - q), 

where  In  is  a nxn  identity  matrix,  q is  the  number  of  terms  in  the  full  model,  and  Y 
is  the  rxn  matrix  formed  by  the  vectors  of  the  actual  responses  at  the  n experimental 
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settings.  Note  that  E is  assumed  to  be  constant  over  the  experimental  region,  so  E 
does  not  depend  on  x.  All  of  the  proposed  methods  are  essentially  the  same  up  until 
this  point.  Now  a function  of  fitted  responses  needs  to  be  maximized  or  minimized. 
The  functions  that  have  been  proposed  include  the  desirability  function  of  Derringer 
and  Suich,  the  distance  measure  of  Khuri  and  Conlon,  a weighted  loss  function  of 
Ames,  et  ah,  and  the  mean  squared  error  of  Vining. 

The  goal  of  optimization  in  the  case  of  a single  response  is  to  maximize  the 
probability  that  the  response  of  interest  will  be  within  its  specification  limits.  This 
is  the  same  goal  when  there  is  more  than  one  response  of  interest,  namely  maximize 
the  probability  that  all  of  the  responses  are  simultaneously  within  their  specification 
limits.  Therefore  the  approach  taken  for  multiple  response  optimization  will  be  a 
multivariate  generalization  of  the  previous  approach  for  a single  response.  The  actual 
optimization  is  done  by  finding  the  combination  of  control  settings  x that  maximizes 
the  probability  that  all  of  the  responses  ?/*  are  within  their  specification  limits, 
and  yrPax , under  the  assumption  that  the  response  vector  y is  multivariate  normal 


with  mean  vector  having  elements  equal  to  yi(x)  and  variance-covariance  matrix  E. 
The  quantity  to  be  maximized  is 


where  some  of  the  maxima  or  minima  may  be  ±oo.  If  a response  is  to  be  maximized 
or  minimized  (’larger  is  better’  or  ’smaller  is  better  ) then  the  appropriate  upper  or 
lower  limit  would  be  +oo  or  — oo,  respectively. 

The  proposed  probability  method  has  many  advantages  over  the  other 
methods  that  have  been  proposed  for  multiple  response  optimization.  One  of  the 
issues  that  was  raised  with  many  of  the  other  methods  had  to  do  with  targets 
that  had  to  be  specified  in  the  cases  where  the  response  was  to  be  maximized  or 
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minimized.  In  these  cases,  an  artificial  target  was  specified  that  was  based  on 
either  knowledge  of  the  situation,  the  absolute  maximum  or  minimum  that  could 
be  attained  in  the  region  of  interest,  or  some  other  unspecified  method.  The  only 
parameters  that  need  to  be  specified  for  the  probability  method  are  the  upper  and 
lower  limits  on  each  of  the  responses,  which  are  often  known.  When  the  response  is 
to  be  maximized  or  minimized,  the  upper  or  lower  limit  that  is  not  clearly  known 
is  merely  set  to  +oo  or  — oo,  respectively.  The  probability  method  is  not  trying  to 
reach  an  artificial  target  in  these  situations,  rather  it  is  trying  to  get  the  distribution 
as  far  from  the  specification  limits  as  is  reasonable.  The  idea  of  trying  to  keep  the 
response  away  from  its  specification  limits,  rather  than  get  it  close  to  a target,  is 
the  main  philosophical  difference  between  the  probability  method  and  the  other 
methods  that  have  been  proposed.  For  instance,  Figure  4.1  shows  a situation  of  two 
responses  which  are  to  be  maximized,  and  some  of  the  contours  of  constant  loss 
from  not  being  at  the  targets.  Note  that  points  A and  B have  the  same  loss,  though 
point  A is  out  of  specification  for  y\. 

A concern  with  the  desirability  method  of  Derringer  and  Suich  is  that  there 
may  be  a number  of  shape  parameters  that  need  to  be  specified  in  order  to  perform 
the  optimization.  In  the  absence  of  any  other  information,  these  shape  parameters 
are  typically  set  at  1.  These  shape  parameters  can,  however,  be  used  to  give  more 
or  less  weight  to  certain  responses,  or  to  influence  a certain  response  to  be  higher  or 
lower  due  to  economic  situations.  Even  with  good  knowledge  of  the  engineering  and 
economic  situations,  it  may  be  difficult  to  specify  the  shape  parameters  that  will 
reflect  the  true  situation  that  is  desired.  A similar  concern  arises  with  the  weight 
parameters  that  need  to  be  specified  with  the  weighted  loss  function  approach  of 
Ames,  et  al.  Similarly,  the  need  to  specify  a cost  matrix  is  a concern  with  the  mean 
squared  error  method  of  Vining.  The  probability  method  avoids  the  need  to  specify 
shape,  weight,  or  cost  parameters  by  relying  on  the  upper  and  lower  specifications 
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Figure  4.1:  Contours  of  Constant  Loss. 
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for  the  individual  responses.  Obviously,  if  the  shape,  weight,  or  cost  parameters  are 
easier  to  specify  than  the  specification  limits,  then  the  probability  method  may  not 
be  the  best  method  of  optimization.  In  many  industrial  situations,  however,  the 
specifications  are  clearly  defined,  whereas  the  shape,  weight,  and  cost  parameters 

are  not. 

The  proposed  method  does  utilize  the  fact  that  the  responses  may  be 
correlated,  which  is  not  the  case  for  the  methods  of  Derringer  and  Suich  or  Ames,  et 
al.  As  the  number  of  quality  characteristics  that  are  monitored  increases,  the  chances 
for  some  of  them  to  be  highly  correlated  increases.  Ignoring  possible  correlations 
may  lead  to  erroneous  conclusions,  where  highly  correlated  characteristics  unfairly 
weight  one  area  of  the  experimental  region  over  another. 

There  are  areas  where  the  probability  method  proposed  here  may  not  be 
the  best  approach  to  multiple  response  optimization.  One  situation,  discussed 
earlier,  deals  with  economic  trade-offs  that  may  need  to  be  dealt  with.  For  instance, 
if  a certain  response  can  be  inexpensively  reworked  if  it  falls  above  the  upper 
specification  limit,  but  must  be  scrapped  if  it  falls  below  the  lower  specification 
limit,  there  is  a definite  economic  advantage  to  keeping  the  response  toward  the 
higher  end  of  the  specification  range.  The  probability  method  is  not  currently  able 
to  deal  adequately  with  such  a situation,  though  it  may  be  possible  in  the  future. 
Another  situation  that  the  probability  method  may  not  be  suited  for  is  encountered 
often  in  the  food  industry,  where  there  are  several  responses  that  are  measured,  but 
only  a few  are  critical  to  the  success  of  the  product.  The  probability  method  will 
treat  all  responses  equally,  and  there  is  currently  no  methodology  to  adapt  this  to  a 
situation  where  the  responses  have  different  weights. 

4.2  Tire  Tread  Compound  Example 

Derringer  and  Suich  present  an  example  where  a tire  tread  compound  is  to  be 
optimized.  The  experiment  is  to  determine  the  optimal  setting  of  three  ingredients, 
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Table  4.1:  Experimental  Design. 


Compound  No. 

*i 

%2 

y i 

2/2 

2/3 

2/4 

1 

-1 

-1 

±1 

102 

900 

470 

67.5 

2 

±1 

-1 

-1 

120 

860 

410 

65 

3 

-1 

±1 

-1 

117 

800 

570 

77.5 

4 

±1 

±1 

±1 

198 

2294 

240 

74.5 

5 

-1 

-1 

-1 

103 

490 

640 

62.5 

6 

±1 

-1 

±1 

132 

1289 

270 

67 

7 

-1 

±1 

±1 

132 

1270 

410 

78 

8 

±1 

±1 

-1 

139 

1090 

380 

70 

9 

-1.633 

0 

0 

102 

770 

590 

76 

10 

±1.633 

0 

0 

154 

1690 

260 

70 

11 

0 

-1.633 

0 

96 

700 

520 

63 

12 

0 

1.633 

0 

163 

1540 

380 

75 

13 

0 

0 

-1.633 

116 

2184 

520 

65 

14 

0 

0 

±1.633 

153 

1784 

290 

71 

15 

0 

0 

0 

133 

1300 

380 

70 

16 

0 

0 

0 

133 

1300 

380 

68.5 

17 

0 

0 

0 

140 

1145 

430 

68 

18 

0 

0 

0 

142 

1090 

430 

68 

19 

0 

0 

0 

145 

1260 

390 

69 

20 

0 

0 

0 

142 

1344 

390 

70 

hydrated  silica  xx,  silane  coupling  agent  £2,  and  sulfur  level  x3.  The  properties  to 
be  optimized,  and  their  specifications  are: 


PICO  Abrasion  Index,  yx 

120  < 

2/i 

200%  Modulus,  i/2 

1000  < 

2/2 

Elongation  at  Break,  yz 

400  < 

2/3 

< 600 

Hardness,  y$ 

60  < 

2/4 

< 75 

A central  composite  design  with  20  runs  is  used  for  the  experiment.  The  central 
composite  design  consists  of  a full  23  factorial,  with  settings  coded  as  ±1,  axial  points 

coded  as  ±1.633,  and  six  center  points,  coded  as  0.  The  settings  and  responses  for 

« 

the  experiment  are  in  Table  4.1.  The  data  was  fit  to  a full  second  order  response 
surface  polynomial,  and  the  estimates  of  the  coefficients  are  in  Table  4.2. 
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Table  4.2:  Regression  Coefficients  and  Standard  Errors  for  Responses. 


A 

2/i 

A 

2/2 

A 

2/3 

A 

2/4 

A 

139.12 

1261.11 

400.38 

68.91 

A 

16.49 

268.15 

-99.67 

-1.41 

A 

@2 

17.88 

246.50 

-31.40 

4.32 

As 

10.91 

139.48 

-73.92 

1.63 

Ai 

-4.01 

-83.55 

7.93 

1.56 

A2 

-3.45 

-124.79 

17.31 

0.06 

&3 

-1.57 

199.17 

0.43 

-0.32 

&2 

5.13 

69.38 

8.75 

-1.63 

As 

7.13 

94.13 

6.25 

0.13 

As 

7.88 

104.38 

1.25 

-0.25 

Std.  Error 

5.61 

328.69 

20.55 

1.27 

A comparison  of  all  of  the  proposed  methods  will  be  done  using  the 
regression  equations  given  in  Table  4.2,  and  limits  given  by  Derringer  and  Suich. 
The  optimization  will  be  performed  within  a sphere  of  radius  1.633.  Additional 
parameters  need  to  be  specified  for  some  of  the  methods.  The  Derringer  and  Suich 
method  uses  artificial  upper  limits  of  170  and  1300  for  the  first  two  responses, 
respectively.  All  of  the  shape  parameters  are  set  to  1.  The  Khuri  and  Conlon 
method  would  probably  use  192  and  2271  as  targets  for  the  first  two  responses, 
these  being  the  maximum  values  that  can  be  attained  by  these  responses  within 
the  experimental  region.  Ames  et  al.  use  120  and  1000  as  targets  for  the  first  two 
responses,  though  these  targets  are  probably  ill  conceived,  as  they  are  the  lower 
specification  limits  on  the  responses.  A better  choice  would  have  been  the  upper 
limits  specified  by  Derringer  and  Suich.  The  cost  matrix  used  for  Vining’s  method 
is  C — S 1,  which  is  based  on  a multivariate  control  statistic.  The  results  are  in 
Table  4.3  for  the  proposed  probability  method,  the  desirability  method  of  Derringer 
and  Suich,  the  distance  method  of  Khuri  and  Conlon,  the  weighted  loss  method  of 
Ames  et  ah,  and  the  mean  square  error  (MSE)  method  of  Vining.  The  probability 
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Table  4.3:  Comparison  of  Optimization  Methods. 


Method 

Proposed 

Desirability 

Distance 

Wgtd  Loss 

MSE 

Location 

x1 

0.329 

-0.050 

0.534 

-0.461 

0.073 

x2 

0.863 

0.145 

1.536 

-0.283 

0.408 

xz 

-1.244 

-0.868 

-0.148 

-0.528 

-0.549 

Response 

/S 

2/i 

131.1 

129.4 

166.3 

122.7 

138.7 

A 

2/2 

1464 

1300 

1474 

1069 

1318 

A 

2/3 

445.4 

465.7 

359.4 

500.3 

423.6 

A 

2/4 

69.6 

68.0 

73.8 

67.5 

69.6 

Probability 

0.886 

0.781 

0.020 

0.403 

0.719 

listed  is  the  probability  that  all  of  the  responses  are  within  their  specification,  or 


P(120  < yi,  1000  < y2 ; 400  < y3  < 600; 60  < y4  < 75) 

■600  /*75 


‘OO  POO 


120  j 1000  J 400 


r 0 ~ - -i 

/ (27r)“2|i)| ~1/2exp((y  - y(x))"Z  (y-y(x)))dy, 

J 60 


assuming  that  the  response  vector  y is  multivariate  normal  with  mean  vector 
y(x ) and  variance-covariance  matrix  S.  The  Fortran  program  that  calculated  this 
probability  is  contained  in  Appendix  B.  The  estimate  of  the  variance-covariance 
matrix  is 


31.49 

34.78 

-3.14 

1.13  ' 

34.78 

108039.33 

-1489.08 

30.36 

-3.14 

-1489.08 

422.27 

-1.36 

1.13 

30.36 

-1.36 

1.61  j 

As  the  table  shows,  the  proposed  probability  method  outperforms  the  other 
methods  when  the  probability  of  being  within  specification  is  compared.  The 
probability  of  0.886  is  much  larger  than  Derringer  and  Suich’s  0.781,  and  Vining’s 
0.719.  The  other  two  methods  perform  rather  poorly  in  comparison,  with  Khuri 
and  Conlon  predicting  a probability  of  0.020,  and  Ames  et  al.  predicting  0.403. 
Derringer  and  Suich’s  method  seems  to  suffer  from  the  choice  of  the  artificial  upper 
limit  on  y2.  They  choose  an  upper  limit  of  1300,  above  which  the  product  is  not 
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thought  to  have  any  added  value,  and  their  optimal  solution  does  indeed  hit  1300. 
This  level,  however,  is  only  300  above  the  lower  limit  of  1000,  which  is  less  than 
the  standard  error  of  327.  The  probability  method,  on  the  other  hand,  has  this 
response  at  1464,  or  roughly  one  and  a half  times  the  standard  error.  Vining’s  MSE 
method  uses  the  same  artificial  upper  limit  of  1300  for  y 2 and  suffers  similarly  for 
it.  The  weighted  loss  method  of  Ames  et  al.  used  a target  of  1000  for  y2,  which  was 
not  a good  choice.  The  resulting  optimum  has  y 2 = 1069,  which  is  barely  inside 
the  specification  limit  of  1000.  The  Khuri  and  Conlon  method  suffers  the  most 
by  ignoring  any  of  the  specifications  and  relying  solely  on  targets.  Their  target 
for  y\  is  192,  and  their  resulting  optimum  has  y\  — 166.3,  which  is  46.3  above  the 
lower  limit  of  120.  In  terms  of  the  standard  error,  46.3  translates  to  8.25  times  the 
standard  error  of  5.61,  clearly  much  farther  away  from  the  specification  limit  than  it 
needs  to  be.  For  comparison,  the  proposed  probability  method  has  yi  = 131.1,  or 
approximately  2 times  the  standard  error  above  the  specification  limit  of  120. 

In  the  above  comparison,  Vining’s  MSE  method  suffers  greatly  due  to  the 
choice  of  1300  as  an  upper  limit  for  y 2.  Since  Vining  did  not  use  this  example  in 
his  paper,  it  is  not  clear  whether  he  would  have  selected  1300  as  an  appropriate 
upper  limit,  since  much  of  the  experimental  data  is  at  or  above  this  level.  In  order 
to  investigate  the  effect  of  this  choice,  the  analysis  is  done  for  several  choices  of  an 
upper  limit,  with  the  results  being  given  in  Table  4.4.  The  final  upper  limit  used  is 
slightly  larger  than  the  largest  value  of  2/2  observed  in  the  experiment.  It  is  obvious 
that  a higher  upper  limit  for  ?/2  improves  the  solution  in  terms  of  the  probability 
of  being  within  specification,  but  it  is  this  dependence  on  the  choice  of  an  upper 
limit  that  causes  problems  for  this  method.  A lower  specification  limit  is  given 
for  y\  and  y^ , and  each  response  should  be  as  large  as  possible.  Trying  to  define 
what  an  appropriate  upper  level  should  be  can  be  quite  difficult.  It  is  assumed  that 
Derringer  and  Suich  used  upper  limits  based  on  their  knowledge  of  the  process  and 
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Table  4.4:  MSE  Method  for  Different  Choices  of  y2 s Upper  Limit. 


Limit 

*s) 

/ A A 

\y  1,2/2, 

2/3,  2/4) 

MSE 

Prob 

1400 

(0.106, 

0.447, 

-0.587) 

(138.9, 

1331, 

422.6, 

69.6) 

53.8 

0.7180 

1600 

(0.110, 

0.446, 

-0.616) 

(138.5, 

1333, 

424.3, 

69.6) 

55.6 

0.7351 

1800 

(0.114, 

0.445, 

-0.647) 

(138.0, 

1335, 

426.2, 

69.5) 

58.2 

0.7520 

2000 

(0.118, 

0.442, 

-0.679) 

(137.4, 

1338, 

428.2, 

69.4) 

61.5 

0.7684 

2200 

(0.122, 

0.438, 

-0.712) 

(136.9, 

1341, 

430.3, 

69.3) 

65.5 

0.7842 

2400 

(0.125, 

0.434, 

-0.747) 

(136.3, 

1345, 

432.7, 

69.2) 

70.2 

0.7990 

product  that  they  were  experimenting  with.  If  there  is  not  enough  information  to 
set  an  appropriate  upper  limit,  other  methods  may  be  attempted.  The  method 
proposed  by  Khuri  and  Conlon  picked  the  largest  fitted  value  that  could  be  achieved 
in  the  experimental  region.  As  the  above  example  shows,  such  a choice  may  result 
in  a solution  that  has  one  of  the  responses  well  outside  of  its  specification  limits.  A 
similar  situation  can  exist  with  the  MSE  approach.  Setting  the  upper  limit  for  y2 
to  2400  should  imply  that  the  upper  limit  for  yi  should  be  set  at  200,  since  both 
values  are  slightly  larger  than  the  largest  value  observed  in  the  experiment.  These 
choices  for  the  upper  limit  will  give  a result  that  is  even  worse  than  that  of  Khuri 
and  Conlon.  The  maximum  MSE  is  achieved  at  x = (0.611,1.514,0.009),  which 
gives  a response  vector  y = (171.8, 1548,342.1,73.8)  which  results  in  a probability 
of  meeting  all  specifications  of  0.002.  Problems  with  the  definition  of  an  appropriate 
upper  limit  for  ’larger  is  better’  situations  illustrates  the  advantage  of  using  the 
proposed  probability  method.  The  probability  method  is  not  trying  to  get  the 
response  as  close  to  some  chosen  target,  but  is  trying  to  get  it  as  far  as  possible  from 
the  specification  limit. 

4.3  Repeatability  of  a Multiresponse  Optimization  Experiment 
One  issue  that  needs  to  be  dealt  with  in  optimization  experiments  is  the 
question  of  how  close  the  final  optimal  solution  is  to  the  true  optimal  solution. 
One  way  to  look  at  this  question  is  to  define  the  true  functional  relationship  of  the 
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response  variables  to  the  control  variables,  as  well  as  the  true  variability  associated 
with  the  responses,  and  repeatedly  rerun  the  experiment  using  randomly  generated 
data.  If  the  experiment  is  simulated  a large  number  of  times,  then  some  information 
about  the  closeness  of  the  estimated  final  solution  to  the  true  solution  can  be 
measured. 

A simulation  was  carried  out  starting  with  the  Derringer  and  Suich  data  set. 
The  true  functional  relationship  is  defined  as 

Ui  — Xf3i  + £»,*=  1, 2, 3, 4, 

where  the  elements  of  /?  are  the  regression  coefficients  from  the  fit  of  the  Derringer 
and  Suich  data,  whose  estimates  are  listed  in  Table  4.2,  and  e;  is  normal  with 
mean  0 and  standard  deviation  equal  to  the  standard  error  from  the  fit  of  the 
Derringer  and  Suich  data,  also  listed  in  Table  4.2.  Given  this  information,  the  20 
run  central  composite  design  can  be  repeatedly  simulated,  the  model  fit,  and  the 
optimal  solutions  found  for  both  the  Derringer  and  Suich  method  and  the  currently 
proposed  probability  method.  This  was  done  5000  times  with  the  results  shown 
in  Figures  4.2  and  4.3.  Figure  4.2  gives  the  distribution  of  each  element  of  the 
deviation  x - xopt  = (aq  - 0.329,  x2  - 0.863,  x3  + 1.244),  where  xopt  is  the  optimal 
solution  found  by  the  probability  method.  Figure  4.3  gives  the  Euclidean  distance 
between  each  solution  x and  the  ideal  maximum,  as  well  as  the  distribution  of  the 
true  probabilities  found  for  each  method.  Table  4.5  gives  the  statistical  information 
contained  in  the  figures.  It  should  be  noted  that  the  simulation  was  done  on  a 
Gateway  personal  computer  with  a 333  MHz  Pentium  processor  in  Microsoft  Fortran 
Powerstation  and  took  appoximately  16  hours  to  run.  The  Fortran  program  is  listed 
in  Appendix  B. 

The  interpretation  of  the  graphs  is  somewhat  difficult,  as  the  desirability 
graphs  for  the  individual  elements  of  the  x vector  from  Derringer  and  Suich’s 
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Figure  4.2:  Distance  from  True  Optimal  x for  Probability  and  Desirability  Methods 
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Figure  4.3:  Euclidean  Distance  from  True  Optimum  and  Probability  for  Probability 
and  Desirability  Methods. 
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Table  4.5:  Repeatability  of  Optimization  Methods. 


Probability 
Mean  Std.  Dev. 

Desirability 
Mean  Std.  Dev. 

%l,opt 
%2  %2,opt 

^3  *^3,  opt 

Distance  from  Optimum 
Probability 

0.105  0.224 

0.177  0.354 

-0.169  0.317 

0.497  0.330 

0.810  0.079 

0.336  0.170 

0.586  0.270 

-0.301  0.253 

0.357  0.250 

0.768  0.076 

method  are  going  to  deviate  more  from  the  optimal  solution  xopt  since  they  are  being 
compared  to  a solution  from  the  proposed  probability  method.  For  this  reason,  the 
central  location  of  those  graphs  are  further  from  0.0  than  the  same  graphs  for  the 
probability  method.  The  variability  of  both  methods  are  roughly  equivalent,  which 
points  out  that  the  precision  of  the  two  methods  are  approximately  the  same.  The 
variability  associated  with  either  method  is  larger  than  what  would  be  hoped  for  in 
an  optimization  experiment.  The  distance  graphs  give  a similar  interpretation,  with 
variability  being  high  for  both  methods,  and  the  probability  method  being  closer  to 
0.0  on  the  whole.  The  true  probability  graphs  show  that  the  probability  method 
gives  an  optimal  solution  that  has  a much  better  probability  of  being  within  all 
specifications  than  the  solution  provided  by  the  desirability  method  of  Derringer 
and  Suich. 

4.4  Albumin  Nanospheres  Example 

Muller  et  al.  (1996)  describe  an  experiment  to  optimize  the  manufacturing 
process  of  albumin  nanospheres.  Albumin  particles  have  been  used  as  drug  carrier 
systems,  and,  if  small  enough,  they  may  be  utilized  as  a site-specific  drug  carrier. 
This  site-specific  manner  of  drug  delivery  could  eliminate  possible  severe  adverse 
drug  reactions  due  to  systemic  application.  The  three  responses  of  interest  are  the 
yield,  ?/i,  particle  size,  y2,  and  polydispersity  index,  y3.  The  yield  is  calculated  as  the 
weight  of  the  dried  microspheres  in  relation  to  the  sum  of  the  starting  material,  and 
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Table  4.6:  Control  Factors  and  Levels. 


Factor 

-1.664 

-1 

0 

+1 

+1.664 

Albumin  (%  w/v) 

5 

11 

20 

29 

35 

Aqueous  phase  volume  (%  v/v) 

1.3 

3.4 

6.5 

9.6 

11.7 

Duration  of  Emulsification  (min) 

5 

9 

15 

21 

25 

Glutaraldehyde  (mmol) 

0.2 

2.8 

6.6 

10.5 

13 

Drug  (mg) 

0 

10 

25 

40 

50 

should  be  made  as  large  as  possible.  The  particle  size  is  obtained  from  a scanning 
electron  microscope,  and  is  a measure  of  the  mean  diameter  of  the  microspheres, 
and  should  be  made  as  small  as  possible.  The  polydispersity  index  is  a measure  of 
the  width  of  the  size  distribution,  and  should  be  made  as  small  as  possible. 

A five-factor  orthogonal  central  composite  design  was  run  to  find  the  optimal 
settings  of  the  control  factors  that  will  produce  the  most  desirable  nanospheres.  The 
five  factors  that  are  thought  to  affect  the  production  are  albumin  concentration, 

Xi,  aqueous  phase  volume,  x%,  duration  of  emulsification,  £3,  glutaraldehyde 
concentration,  X4,  and  amount  of  drug,  x 5.  The  central  composite  design  consists  of 
a 25_1  fractional  factorial,  axial  points,  and  three  center  point  replicates.  The  coded 
and  actual  levels  of  the  factors  are  listed  in  Table  4.6. 

The  experimenters  took  a desirability  function  approach  to  finding  the 
optimal  settings,  in  the  manner  of  Derringer  and  Suich.  The  limits  for  the  responses 
of  interest  are  given  in  Table  4.7.  The  original  analysis  in  the  paper  does  not  adhere 
to  the  exact  desirability  approach  presented  by  Derringer  and  Suich  in  their  original 
paper.  The  purpose  of  the  experiment  was  to  obtain  information  about  the  process 
in  order  to  design  a new  process  which  would  result  in  smaller  nanospheres  with  a 
smaller  polydispersity.  For  this  reason,  they  do  not  actually  recommend  an  optimal 
setting  of  the  control  factors,  they  merely  interpret  the  response  surface  graphs  that 
they  generated,  in  order  to  assist  in  the  design  of  the  new  process.  The  data  from 
this  experiment  will  be  used  to  compare  the  method  that  has  been  proposed  here 
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Table  4.7:  Response  Variables  and  Limits. 


Variable 

Lower  Limit,  y"an 

Upper  Limit,  y™ax 

Goal 

Yield  (%  w/w) 

50 

100 

Maximize 

Particle  Size  (nm) 

0 

500 

Minimize 

Polydispersity  Index 

0 

0.2 

Minimize 

with  the  desirability  method  of  Derringer  and  Suich.  In  order  to  compare  the  two 
methods,  the  data  will  be  used  to  fit  the  individual  responses  to  full  second  order 
polynomial  models  in  the  control  settings.  These  models  will  be  used  to  generate 
the  optimal  settings  for  the  two  methods.  In  addition,  the  probability  approach  will 
need  to  estimate  the  variance-covariance  matrix  of  the  responses. 

The  data  from  the  experiment  can  be  found  in  Table  4.8.  After  an  initial 
attempt  to  analyze  the  data  as  it  is,  it  was  determined  that  the  models  for  size  y 2 
and  polydispersity  y3  would  be  better  fit  after  a log  transform.  The  naive  approach 
of  analyzing  the  data  without  a transformation  led  to  estimates  of  the  responses 
at  the  optimal  settings  that  were  negative.  Clearly  a negative  size  and  a negative 
measure  of  dispersion  were  undesirable.  The  need  to  transform  these  data  should 
not  be  surprising  since  both  of  the  distributions  of  the  responses  are  quite  skewed. 

For  simplicity,  all  of  the  responses  were  fit  to  full  second  order  polynomials,  given  by 

5 55 

PiO  + Pijxj  + 

j= 1 j=l  k>j 

for  i = 1,2,3,  where  the  ft  are  the  least  squares  estimates  of  the  true  regression 
parameters.  The  fitted  parameters  for  the  responses  are  in  Table  4.9. 

The  two  methods  used  to  find  the  optimal  settings  of  the  control  factors  are 
the  probability  method  and  the  desirability  method.  The  probability  method  seeks 
the  location  x that  maximizes  the  probability  P(?/i  > 50;  2/2  < 500;  2/3  < 0.2),  where 
the  transformed  response  vector  (yi,ln(y2)-,ln(yz)y  is  assumed  to  be  multivariate 
normal  with  mean  vector  given  by  (y\-,ln(y2),ln{yz))'  and  variance-covariance  matrix 


A 
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Table  4.8:  Experimental  Data. 


Xl 

%2 

^3 

£4 

£5 

Vi 

2/2 

2/3 

-1 

-1 

-1 

-1 

+1 

78 

222 

0.077 

+1 

-1 

-1 

-1 

-1 

78 

187 

0.083 

-1 

+ 1 

-1 

-1 

-1 

12 

1186 

0.586 

+1 

+ 1 

-1 

-1 

+1 

89 

286 

0.274 

-1 

-1 

+1 

-1 

-1 

83 

323 

0.232 

+1 

-1 

+1 

-1 

+1 

80 

182 

0.112 

-1 

+ 1 

+1 

-1 

+1 

8 

1106 

0.884 

+1 

+ 1 

+1 

-1 

-1 

93 

321 

0.313 

-1 

-1 

-1 

+1 

-1 

91 

277 

0.178 

+1 

-1 

-1 

+1 

+1 

69 

263 

0.273 

-1 

+ 1 

-1 

+1 

+1 

15 

1781 

0.878 

+1 

+ 1 

-1 

+1 

-1 

81 

296 

0.294 

-1 

-1 

+1 

+1 

+1 

89 

324 

0.267 

+1 

-1 

+1 

+1 

-1 

70 

197 

0.110 

-1 

+ 1 

+1 

+1 

-1 

75 

228 

0.089 

+1 

+ 1 

+1 

+1 

+1 

11 

708 

0.664 

-1.664 

0 

0 

0 

0 

90 

330 

0.350 

+1.664 

0 

0 

0 

0 

93 

372 

0.394 

0 

-1.664 

0 

0 

0 

76 

162 

0.107 

0 

+1.664 

0 

0 

0 

7 

1486 

0.718 

0 

0 

-1.664 

0 

0 

83 

2024 

0.498 

0 

0 

+1.664 

0 

0 

99 

381 

0.394 

0 

0 

0 

-1.664 

0 

68 

234 

0.093 

0 

0 

0 

+1.664 

0 

81 

215 

0.039 

0 

0 

0 

0 

-1.664 

73 

220 

0.126 

0 

0 

0 

0 

+1.664 

84 

220 

0.093 

0 

0 

0 

0 

0 

97 

218 

0.101 

0 

0 

0 

0 

0 

82 

221 

0.073 

0 

0 

0 

0 

0 

74 

217 

0.083 

76 


Table  4.9:  Analysis  of  Regression. 


Vi 

ln(y2) 

ln{y3) 

A 

86.254 

5.589 

-2.260 

A 

A 

5.803 

-0.205 

-0.083 

A 

A 

-17.124 

0.487 

0.517 

A 

A 

1.050 

-0.172 

-0.020 

A 

0.076 

0.007 

-0.169 

A 

-5.836 

0.129 

0.157 

Ax 

1.428 

0.049 

0.409 

A2 

-16.630 

0.170 

0.303 

As 

1.248 

0.081 

0.472 

A4 

-4.711 

-0.112 

-0.248 

As 

-3.267 

-0.119 

-0.037 

4x2 

13.000 

-0.127 

0.014 

4x3 

-7.625 

0.142 

0.044 

4x4 

-12.375 

0.137 

0.182 

As 

-1.000 

-0.085 

-0.114 

A4 

-1.250 

-0.062 

-0.201 

As 

-8.250 

0.169 

0.179 

a4 

-1.125 

-0.107 

-0.245 

As 

-7.625 

0.113 

0.157 

As 

-7.625 

0.248 

0.309 

R2 

0.939 

0.870 

0.931 

Adj  R2 

0.785 

0.544 

0.757 

4Mse 

13.311 

0.494 

0.428 

77 


given  by 


A 


177.17 

-3.67 

-3.13 

-3.67 

0.24 

0.13 

-3.13 

0.13 

0.18 

\ 

/ 


The  desirability  method  seeks  the  location  x which  maximizes  an  overall  desirability 
function  D = (did2d3)1/3,  where  di,  the  desirability  of  the  first  response,  is  defined 
by: 


j/i~ 50 
100-50 


dx  = < 0 


\ 


1 


if  50  <yi<  100 
if  yi  < 50 
if  yi  > 100 


and  d2  and  d3,  the  desirabilities  of  the  second  and  third  responses,  respectively,  are 


defined  by: 


r max 

IJi  yi 


y 


max 


ifO  <yi<y\ 


max 

i 


di  = < 


0 


if  m > y 


max 

i 


if  ili  < o 


where  the  y?ax  = 500  and  y^ax  = 0.2. 

The  optimal  settings  given  by  each  method  are  given  in  Table  4.10.  The 
probability  listed  is  the  probability  for  the  given  location  x,  P(?/i  > 50;  2/2  < 

500;  y3  < 0.2).  The  probability  method  results  in  a solution  that  predicts  a smaller 
size  and  polydispersity  than  the  desirability  method,  which  is  the  result  that  was 
sought  by  the  optimization  experiment.  The  yield  that  is  predicted  is  also  smaller, 
though,  which  is  not  what  is  desired.  The  lower  yield  is  still  well  above  the  lower 
limit  of  50  that  was  given.  Since  the  standard  error  for  yield  is  13.39,  the  predicted 
yield  of  75.79%  is  approximately  two  standard  deviations  above  the  lower  limit  of 


50. 
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Table  4.10:  Optimal  Settings. 


Probability 

Desirability 

Xi 

0.542 

-0.059 

-0.533 

0.085 

£3 

0.257 

0.095 

£4 

1.42 

0.008 

£5 

-0.326 

0.083 

A 

2/1 

75.79 

83.79 

ln(y2) 

5.09 

5.64 

ln(y3) 

-2.83 

-2.19 

A 

2/2 

161.60 

282.14 

/S 

2/3 

0.059 

0.112 

Probability 

0.9303 

0.7206 

It  should  be  noted  that  the  assumption  of  normality  is  used  to  greatly  ease 
the  calculations  that  have  been  made.  In  the  above  example,  two  responses  were 
transformed  to  make  the  assumptions  of  normality  more  realistic.  If  one  were 
ambitious  enough,  a different  distribution  could  be  assumed  for  the  responses,  instead 
of  using  a transformation.  For  instance,  the  above  approach  used  a log  transform, 

i 

assumed  that  the  transformed  response  was  normal,  whose  mean  depended  on  the 
control  variables,  and  the  variance  was  constant  over  the  experimental  area.  Instead 
of  this,  the  experimenter  could  have  assumed  a Weibull  distribution,  with  the  scale 
parameter  depending  on  the  control  variables,  and  the  shape  parameter  being 
constant.  The  optimization  would  still  be  to  maximize  the  probability  of  being 
within  specifications,  but  the  distributional  assumptions  would  be  different.  The 
probability  of  being  within  specifications  is  a multiple  integral  that  would  have  to 
be  calculated,  but  it  could  be  performed,  if  desired. 


CHAPTER  5 

MULTIPLE  RESPONSE  ROBUST  PARAMETER  DESIGN 
5.1  A Description  of  the  Proposed  Method 
In  situations  where  there  are  multiple  responses  and  the  variance  of  one  or 
more  of  these  responses  cannot  be  assumed  to  be  constant  over  the  experimental 
region,  the  same  method  that  has  been  used  in  other  scenarios  may  be  used,  i.e. 
maximize  the  probability  that  the  responses  are  within  their  specified  limits  under 
the  assumption  of  normality.  This  method  is  an  extension  of  the  multiple  response 
optimization,  as  well  as  the  robust  parameter  design.  Since  the  variance  cannot 
be  assumed  to  be  constant  over  the  experimental  region,  both  the  response  and 
the  standard  deviation  will  be  modeled  from  the  experimental  data  for  each  of  the 
responses  of  interest.  These  models  will  then  be  used  to  estimate  the  probability 
that  the  responses  are  within  their  respective  limits.  The  final  optimal  setting  of  the 
control  factors  x will  be  that  combination  that  maximizes  the  probability  of  meeting 
all  specifications. 

Let  X{  be  a pxl  vector  representing  the  ith  setting  of  the  control  variables  and 
Dij  be  the  jth  response  of  interest  at  Xi,  for  i = 1, . . . , n and  j = 1, . . . , r.  Suppose 
that  an  appropriate  linear  model  is 

Vij  — f'(xi)Pj  + Uj>  (5-1) 

where  f(xi)  is  a qmx  1 vector  representing  the  appropriate  polynomial  expansion  of 
Xi  for  the  assumed  model  for  the  responses,  and  f3j  is  a qmx  1 vector  of  unknown 
coefficients  for  the  jth  response,  and  the  e*.,-  are  normally  distributed  with  mean  0 and 
variance  of-.  Furthermore,  let  the  errors  of  the  jth  response  be  independent,  or  let 
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€ij  be  independent  of  e^j,  i ^ i! . In  other  words,  the  errors  for  a given  response  are 
assumed  to  be  independent,  but  correlation  exists  between  the  different  responses. 
Consider  an  appropriate  transformation  of  the  variance  h(Tij)  = ajj,  with  a linear 
predictor  defined  by 

Tij  = g{xi)  'jj,  (5.2) 

where  g(xi ) is  a qvx  1 vector  representing  the  appropriate  polynomial  expansion  of 
Xi  for  the  assumed  model  for  the  function  of  the  variance,  and  'Yj  is  a qvxl  vector  of 
unknown  coefficients  for  the  jth  function  of  the  variance.  Common  transformations 
of  the  variance  include  the  log  transformation  (r^  = log{afj))  or  the  square  root 
transformation  (r^  = <7y). 

The  first  step  of  the  optimization  process  will  be  selecting  an  appropriate 
experimental  design  that  will  allow  for  the  estimation  of  the  models  for  both  the 
average  response  and  the  variance  of  the  response  (or  an  appropriate  function  of  the 
variance).  The  model  for  the  mean  response  is  usually  assumed  to  be  a second  order 
polynomial,  so  a response  surface  design  is  typically  used.  Since  the  variance  will 
also  need  to  be  modeled,  some  replication  of  this  design  will  be  necessary  in  order 
to  estimate  the  variance  at  different  points  in  the  experimental  design.  In  addition 
to  the  average  response  and  the  variance,  some  estimate  of  the  correlation  of  the 
responses  will  be  necessary.  With  all  of  this  estimation  needed,  the  experimental 
design  may  need  to  be  very  large,  which  could  make  the  experiment  expensive  and 
difficult  to  carry  out. 

A central  composite  design  with  some  replication  will  allow  estimation  of  the 
functions  of  the  response  of  interest  and  the  variance,  but  the  correlation  structure 
will  still  need  to  be  estimated.  If  the  correlation  structure  is  assumed  to  be  constant 
over  the  experimental  region,  the  addition  of  replicated  center  points  could  be  used 
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to  estimate  the  correlation  of  the  responses  of  interest,  given  by 

nc 

Pfj  = T(ykJ  - yoj)(ykj'  - yoj')/(sojsoj'), 

k— 1 

where  nc  is  the  number  of  times  the  center  point  has  been  replicated,  and  y0j 
and  s0j  are  the  mean  and  the  standard  deviation  of  the  jth  response  at  the  center 
point.  Though  the  assumption  of  a constant  correlation  structure  simplifies  the 
optimization  procedure,  it  may  be  a valid  assumption  in  many  cases,  especially 
where  physical  characteristics  are  the  responses  of  interest  and  their  correlation  is 
inherent  to  the  product.  Another  possible  way  to  estimate  the  correlation  structure 
is  to  use  the  replicated  points  to  form  a pooled  estimate  of  the  correlation.  In  this 
case  the  correlation  can  be  based  on  YSC[I  — X(X' X)~lX']Y sc  divided  by  the 
appropriate  number  of  degrees  of  freedom,  where  Y sc  is  the  matrix  of  responses, 
scaled  by  the  estimated  standard  deviation,  or  y^/s ij.  This  estimate  depends  on  the 
assumed  model,  thus  any  model  misspecification  could  lead  to  a poor  estimate  of 
the  correlation  coefficients. 

Vining  and  Schaub  (1996)  give  some  recommendations  for  experimental 
designs  that  will  be  able  to  estimate  the  mean  and  the  variance  of  a response  and 
will  still  be  small  enough  to  be  practical  to  run.  One  situation  of  interest  is  when 
the  response  of  interest  is  assumed  to  be  a second  order  response  surface  model  of 
the  control  variables,  and  the  variance  (or  appropriate  function  of  the  variance)  is 
assumed  to  be  a linear  function  of  the  control  variables.  In  this  case  they  suggest 
that  a central  composite  design  be  used  as  the  basis  for  the  experiment.  The  full 
design  will  be  used  to  estimate  the  second  order  response  surface  of  the  response  of 
interest.  In  order  to  estimate  the  linear  function  of  the  variance,  the  factorial  points 
of  the  central  composite  design  will  be  replicated,  which  will  allow  for  estimates  of 
the  variance  at  these  points.  The  factorial  portion  of  the  design  will  be  used  to 
estimate  the  linear  function  of  the  variance. 
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Suppose  that  an  appropriate  experiment  has  been  run.  Let  be  the  nxl 
vector  of  the  jth  response,  and  let  Tj  be  the  n„xl  vector  of  the  jth  linear  predictor. 
Here  nv  represents  the  number  of  distinct  design  points  that  have  been  replicated. 
From  equation  (5.1),  we  obtain 


Vj  ~ XPj  + ei 

where  X'  = [/(aq), . . . , f(xn)}.  Similarly,  from  equation  (5.2),  we  obtain 


Tj  = Zjj 

where  Z'  = [y(aq), . . . ,g(xn)].  At  this  point,  the  least  squares  estimates  could  be 
used  to  obtain  estimates  of  the  unknown  parameters  /3j  and  for  j = 1, . . . , r. 

It  is  now  possible  to  estimate,  for  a given  setting  of  the  control  factors  x,  the 
mean  of  each  of  the  responses  and  a function  of  the  variance  for  each  of  the  responses. 
These  are  given  by  Vj{x)  = f(x)/3j  and  fj(x)  = g(x)'jj  for  j = 1, . . . ,r,  where 
= (X'X)~1X'yj  and  7 j = (Z'Z)~1ZlTj.  In  addition,  the  variance-covariance 
matrix  of  the  responses  can  be  estimated  by  S(»),  whose  ith  diagonal  element  is 
h(fj),  and  the  ijth  off-diagonal  element  is  p*j\/h(fj)/i(fj).  The  optimal  setting  of  the 
control  factors  is  the  combination  x that  maximizes  the  probability  that  all  of  the 
responses  y are  within  their  specified  limits,  or 


The  probability  is  calculated  by  assuming  that  the  response  vector  y is  multivariate 

A 

normal  with  mean  vector  y(x)  and  variance-covariance  matrix  E(a:).  For  responses 
that  are  to  maximized,  the  upper  limit  is  set  equal  to  +00,  and  for  responses  that 
are  to  be  minimized,  the  lower  limit  is  set  to  —00. 

This  method  of  maximizing  the  probability  of  being  within  specification  has 
the  appeal  that  it  is  an  extension  of  the  methods  proposed  in  previous  chapters. 
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This  will  allow  an  experimenter  to  proceed  in  a similar  fashion  for  any  optimization 
problem.  Each  experimental  situation  will  be  slightly  different  with  regards  to  the 
choice  of  experimental  design,  methods  for  estimating  the  parameters  of  interest, 
and  the  models  that  are  assumed  to  be  correct,  but  the  optimal  setting  of  the  control 
factors  x will  be  found  in  the  same  way,  i.e.  maximizing  a probability. 

Pignatiello  is  the  only  other  person  to  propose  a method  of  optimization  for 
the  situation  of  multiple  responses  whose  variance  cannot  be  assumed  constant. 

He  proposes  replicating  every  point  in  an  experimental  design  and  minimizing  an 
estimate  of  the  mean  squared  error,  given  by 

-R(flj)  = trace (CS (a;))  T (y(aj)  Vtarget ) ^(sK®)  Vtarget) i 

where  C is  a cost  matrix  that  must  be  specified.  This  method  requires  that 
the  quantity  R(x)  be  found  for  each  point  in  the  experimental  design,  and  this 
quantity  (or  a function  of  this  quantity)  is  to  be  fit  to  a predictive  model  in  the 
control  variables  x.  This  approach  needs  a separate  estimate  of  the  mean  and 
variance-covariance  matrix  at  each  point  in  the  experimental  design.  The  obvious 
estimate  of  the  mean  is  the  average  of  the  replicated  runs  at  that  point  y(x).  The 
estimate  of  the  variance-covariance  matrix  at  a given  experimental  point  is  given  by 
S(s),  whose  ijth  element  is  given  by 

nr 

^2(yik  -Vi)(yjk  -yj)/nr, 

k= 1 

where  nr  is  the  number  of  times  that  the  design  point  has  been  replicated. 

The  mean  squared  error  method  proposed  by  Pignatiello  does  have  some 
problems  that  need  to  be  addressed.  The  requirement  that  the  quantity  R(x)  be 
calculated  at  every  design  point  means  that  the  entire  experiment  must  be  replicated 
enough  times  to  allow  for  proper  estimation  of  all  of  the  quantities  in  the  equation. 
This  total  replication  may  lead  to  a very  large  experiment.  A smaller  experiment 
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may  be  obtained  by  the  use  of  partial  replication,  as  described  above.  One  could 
easily  use  the  same  estimations  of  the  response  and  the  variance-covariance  matrix 
as  are  used  for  the  proposed  probability  method,  and  then  find  the  location  of  the 
control  variables  x that  minimized  the  quantity  R(x).  This  would,  in  essence, 
replace  the  probability  function  with  the  mean  squared  error  function,  with 
everything  else  remaining  essentially  the  same. 

A more  serious  problem,  perhaps,  is  that  of  selecting  a target  in  the  cases  of 
larger  or  smaller  is  better,  and  of  selecting  the  values  for  the  cost  matrix.  Pignatiello 
admits  that  the  targets  are  only  applicable  in  the  case  of  ’target  is  best,’  and  he  does 
not  recommend  using  this  method  for  the  other  situations.  The  experimenter  could 
conceivably  use  artificial  targets,  as  Derringer  and  Suich  recommend  for  multiple 
response  optimization,  but  the  final  solution  could  depend  greatly  on  the  choice 
of  these  targets.  Though  he  does  give  recommendations  for  selecting  the  values 
for  the  cost  matrix,  it  still  may  be  based  on  subjective  choices  which  may  greatly 
influence  the  final  solution.  These  cost  quantities  may  be  less  well  known  than  the 
specifications  required  for  the  product.  As  was  seen  in  the  case  of  multiple  response 
optimization,  the  final  solution  can  depend  greatly  on  the  choices  of  targets  and  cost 
matrix  values. 

The  proposed  probability  method  does  have  some  shortcomings,  also,  such 
as  ease  of  calculation,  size  of  the  experiment,  and  the  ability  to  estimate  all  of  the 
parameters.  Since  a computer  program  needs  to  be  written,  this  method  is  not  as 
straight  forward  as  a method  that  could  be  done  in  a spreadsheet,  such  as  Microsoft 
Excel.  The  program  that  was  used  for  the  example  that  follows  is  similar  to  that 
listed  in  Appendix  B,  and  it  should  be  fairly  easy  to  modify  for  other  applications. 
The  potential  large  size  of  the  experiment  is  not  unique  to  the  probability  method. 
Any  time  both  the  mean  response  and  the  variance  need  to  be  estimated,  replication 
of  all  or  part  of  a traditional  design  will  be  needed. 
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5.2  Oxygen  Plasma  Anodization  Example 

A simulated  example  will  be  presented  that  deals  with  the  frequency 
adjustment  of  quartz-aluminum  oscillator  crystals.  The  following  is  a hypothetical 
process,  and  the  description  should  be  treated  only  as  motivation  for  the  simulated 
experiment  that  follows.  The  crystals  have  aluminum  plating  on  top  of  a quartz 
substrate.  The  frequency  of  the  crystals  depends  on  the  thickness  of  the  quartz  and 
the  total  mass  of  the  aluminum  plating.  The  crystal  is  adjusted  to  its  final  frequency 
by  means  of  an  oxygen  plasma.  The  crystal  is  placed  in  an  oxygen  rich  chamber 
near  a high  voltage  anode.  When  voltage  is  applied  to  the  anode,  the  surrounding 
oxygen  becomes  a plasma.  When  the  crystal  is  placed  near  the  plasma,  the  oxygen 
molecules  combine  with  the  aluminum  molecules  to  create  aluminum  oxide.  The 
extra  oxygen  molecules  increase  the  mass  of  the  aluminum  plating,  thus  reducing 
the  frequency  of  the  crystal. 

The  responses  of  interest  in  the  frequency  adjustment  of  the  crystals  are  the 
total  amount  of  frequency  change  that  can  be  achieved,  yi,  and  the  final  resistance 
of  the  crystal,  y2.  The  total  amount  of  frequency  adjustment  should  be  as  large  as 
possible,  but  must  be  greater  than  60  kHz  in  order  for  the  process  to  run  efficiently. 
Past  experience  has  shown  that  the  range  of  frequencies  after  the  initial  aluminum 
plating  is  approximately  60  kHz,  due  mostly  to  differences  in  the  thickness  of  the 
quartz  (see  Figure  5.1).  The  goal  of  initial  plating  is  to  have  the  frequency  of  the 
crystal  30  kHz  higher  than  the  final  target  frequency,  on  average.  If  this  is  achieved, 
then  the  frequency  of  the  crystals  range  from  target  to  60  kHz  above  target.  All  of 
the  crystals  are  then  put  into  the  oxygen  plasma  chamber  in  order  to  adjust  all  of 
them  to  the  same  final  frequency.  Due  to  process  variability,  not  all  lots  of  crystals 
are  targeted  properly  at  30  kHz  above  final  frequency,  thus  some  crystals  are  below 
the  final  frequency  and  some  are  more  than  60  kHz  above  the  final  target.  Those 
crystals  that  are  too  low  in  frequency  must  be  reworked.  If  the  process  cannot 
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Figure  5.1:  Frequency  Distribution  of  Quartz  Crystals. 

adjust  the  frequency  more  than  60  kHz,  the  crystals  that  are  too  high  in  frequency 
will  be  unable  to  be  adjusted  to  final  frequency,  and  must  be  reworked.  If  the  range 
of  frequency  adjustment  can  be  made  greater  than  60  KHz,  then  the  frequency 
distribution  can  be  shifted  slightly,  and  the  amount  of  crystals  that  need  to  be 
reworked  could  be  reduced.  It  is  for  this  reason  that  settings  for  the  control  variables 
x are  sought  that  maximize  the  amount  that  the  frequency  can  be  adjusted.  The 
resistance  of  the  crystal  is  effected  most  by  contaminants  in  earlier  processing  steps, 
and  by  specification  must  be  held  below  30  ohms. 

The  control  variables  that  affect  the  responses  y are  the  distance  that  the 
crystal  is  from  the  anode,  aq,  and  the  pressure  of  the  oxygen  rich  chamber,  x 
When  the  pressure  is  too  low,  there  is  not  enough  oxygen  available  to  bind  to  the 
aluminum  plating  on  the  crystal,  causing  the  total  possible  frequency  shift  to  be 
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quite  low.  As  the  pressure  increases,  the  amount  of  oxygen  available  increases,  which 
in  turn  increases  the  amount  that  the  frequency  can  be  shifted.  At  some  point, 
the  increased  pressure  causes  the  plasma  to  be  contained  too  close  to  the  anode, 
which  begins  to  reduce  the  amount  that  the  frequency  can  be  shifted.  When  the 
crystal  is  too  far  from  the  anode,  none  of  the  plasma  is  available  to  the  aluminum, 
so  the  amount  that  the  frequency  can  be  adjusted  is  fairly  small.  As  the  crystal 
moves  closer  to  the  anode,  it  comes  in  contact  with  the  plasma  cloud  that  surrounds 
the  anode,  and  thus  the  amount  that  the  frequency  can  be  adjusted  increases.  At 
some  point  the  crystal  can  become  too  close  and  actually  inhibit  the  flow  of  oxygen 
around  the  anode,  thus  reducing  the  plasma  cloud  and  decreasing  the  amount  that 
the  frequency  can  be  adjusted.  The  variability  of  the  amount  of  frequency  shift 
tends  to  increase  as  distance  between  the  anode  and  the  crystal  decreases  and  the 
pressure  increases.  The  resistance  of  the  crystal  can  increase  due  to  contaminants 
coming  off  of  the  anode  and  attaching  to  the  crystal.  This  typically  happens  more 
at  low  pressures  and  when  the  distance  between  the  anode  and  the  crystal  is  small. 
The  variability  of  the  resistance  follows  the  same  pattern  as  the  resistance  itself, 
increasing  as  the  distance  and  pressure  decrease. 

A simulated  experiment  is  performed  whose  goal  is  to  find  the  settings  of  the 
control  factors  x that  maximizes  the  probability  that  the  resistance  is  less  than  30 
ohms,  and  the  amount  that  the  frequency  can  be  adjusted  is  greater  than  60  kHz. 
The  true  functional  relationship  between  the  responses,  their  variability,  and  the 
control  factors  is  as  follows: 

yi  = 80.0  T l.Oxi  T 0.5x2  — S.Ox^  3.0x2  T 2. 0x^X2 
y2  = 20.0  + 0.5xi  - 0.5x2  + 3.0xi  + 2.0x2  + l-0xix2 
o\  = 10.0  — 3.0xi  + I.O.X2 
<r2  = 7.0  - l.Oxi  - 0.3x2 
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Table  5.1:  Experimental  Data. 


X\ 

X2 

2/i 

2/2 

Sl 

S2 

-1 

-1 

69.51801 

40.09109 

-1 

-1 

94.16965 

27.49232 

19.79 

13.23 

-1 

-1 

55.02909 

13.63547 

1 

-1 

75.88098 

12.99614 

1 

-1 

69.30468 

24.13184 

5.48 

7.43 

1 

-1 

65.00851 

27.08687 

-1 

1 

84.19299 

27.77813 

-1 

1 

63.19381 

15.60778 

12.41 

6.09 

-1 

1 

62.22724 

21.59617 

1 

1 

67.22293 

29.85625 

1 

1 

73.92506 

20.72596 

3.76 

4.84 

1 

1 

73.53838 

22.49168 

-1.414 

0 

63.58725 

24.38051 

1.414 

0 

75.03255 

33.32704 

0 

-1.414 

62.53672 

21.47707 

0 

1.414 

68.58016 

29.29213 

0 

0 

83.01640 

5.34768 

0 

0 

81.90584 

8.20455 

0 

0 

94.42947 

14.14651 

7.67 

8.28 

0 

0 

76.35438 

27.28385 

0 

0 

93.47378 

22.04167 

0 

0 

93.47162 

17.18751 

The  covariance  between  the  two  responses  is  p\2  = 0.2.  The  setting  of  the  control 
variables  that  maximizes  the  probability  that  both  responses  are  acceptable  is 
a;  = (0.302,0.073),  which  results  in  a probability  of  0.9098,  and  response  vector 
y = (79.9,20.4).  The  standard  deviation  of  the  responses  at  this  setting  is  o\  — 9.2 
and  02  = 6.7. 

The  experiment  will  be  a central  composite  design  with  the  factorial  points 
replicated  three  times  and  the  center  point  replicated  six  times.  The  axial  points  are 
not  replicated.  Data  is  simulated  for  the  experiment  and  is  in  Table  5.1.  The  data 
from  the  experiment  is  used  to  fit  a full  second  order  model  for  the  responses,  a first 
order  model  for  the  standard  deviation,  and  the  correlation  between  the  responses. 
All  of  the  data  is  used  to  separately  fit  a full  second  order  model  for  the  responses. 
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The  fitted  equations  are: 

yi(x)  — 87.1  + 0.8xi  + 0.2x2  - 7.5xJ  - 9.4x2  + l.lxix2 
y2(x)  = 15.7  + 0.2X!  + 0.2x2  + 5.3x?  + 3.5x2  + 2.1XiX2 

Just  the  five  replicated  points  are  used  to  fit  a first  order  model  for  the  standard 
deviation  of  the  responses.  The  fitted  equations  are: 

&i(x)  = 9.8  - 5.7x!  - 2.3x2 
cr2(x)  = 8.0  — 1.8xi  — 2.4x2 

The  six  center  points  are  used  to  estimate  the  correlation  between  the  two  responses. 
The  estimated  correlation  is  p\2  — —0.02,  which  is  assumed  to  be  constant  over  the 
experimental  region. 

The  optimal  setting  of  the  control  variables  x is  the  setting  which  maximizes 
P(t/i  > 60;  y2  < 30)  in  the  experimental  region  of  radius  \/2,  under  the  assumption 
that  y is  multivariate  normal  with  mean  vector  y and  variance-covariance  structure 

^ a\{x)  Pi2&i(x)v2(x)  ' 

^ Pi2^i{x)a2(x)  &l(x)  J 

The  probability  is  calculated  using  the  algorithm  from  Schervish,  which  has  been 
described  previously,  and  the  maximum  of  the  probability  function  is  obtained  from 
the  IMSL  algorithm  NCONF,  which  finds  the  minimum  of  a function  with  nonlinear 
constraints.  Based  on  this  simulated  experiment,  the  setting  of  the  control  variables 
that  maximizes  the  probability  of  both  of  the  responses  being  within  their  acceptable 
limits  is  x — (0.235,0.555).  At  this  location  in  the  experimental  region,  the  true 
response  vector  is  y = (79.6,20.8),  and  the  probability  is  0.8967.  The  standard 
deviation  of  the  responses  at  this  setting  is  o\  = 9.9  and  a2  = 6.6.  A comparison  of 
the  true  solution  to  the  experimental  solution  is  contained  in  Table  5.2.  Though  the 
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Table  5.2:  Comparison  of  Solutions. 


True  Optimum 

Experimental  Optimum 

Control  Setting 

(0.302,  0.073) 

(0.235,  0.555) 

Mean  Response 

(79.9,  20.4) 

(79.6,  20.8) 

Standard  Deviation 

(9.2,  6.7) 

(9.9,  6.6) 

Probability 

0.9098 

0.8967 

location  of  the  control  settings  appear  to  be  fairly  far  apart,  the  response,  standard 
deviation,  and  probability  are  not  that  different.  This  is  because  the  probability  of 
being  within  acceptable  limits  is  fairly  flat  near  the  true  optimal  setting. 

5.3  Repeatability  of  Anodization  Example 
The  above  analysis  is  a single  experiment  simulated  from  a hypothetical 
process  and  is  only  meant  to  be  a step  by  step  guide  to  finding  the  optimal  setting  of 
control  factors  that  results  in  the  maximum  probability  of  being  within  acceptable 
limits.  Since  this  is  an  analysis  based  on  a single  experiment,  it  would  be  beneficial 
to  assess  the  repeatability  of  running  such  an  experiment.  Since  the  experiment 
is  simulated,  it  may  be  easily  repeated.  The  experimental  data  was  regenerated 
1000  times,  each  set  of  data  being  used  to  estimate  the  mean  responses,  standard 
deviations,  correlation,  and  optimal  setting  of  the  control  factors.  The  results  are 
given  in  Table  5.3,  which  lists  the  true  probability  at  the  setting  of  the  control 
factors  and  the  distance  the  setting  is  from  its  true  optimal  setting  Xit0pt.  As  was  the 
case  with  simple  multiple  response  optimization,  the  repeatability  of  this  simulated 
experiment  is  not  particularly  good.  A scatter  plot  of  the  the  locations  of  the 
experimental  optima  is  in  Figure  5.2,  which  has  50  and  90%  confidence  ellipses.  This 
figure  is  perhaps  a better  indication  that  a single  experiment  may  not  do  a good 
job  in  finding  a point  that  is  close  to  the  true  optimum.  As  was  seen  with  the  case 
of  multiple  response  optimization,  the  lack  of  repeatability  does  not  seem  to  be  a 
problem  that  is  isolated  to  one  method  of  optimization,  but  seems  to  be  a problem 
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Table  5.3:  Repeatability  of  Experiment. 


True  Probability 

%l  %l,opt 

^2,  opt 

Mean 

0.821 

0.099 

-0.022 

Std.  Dev. 

0.081 

0.564 

0.735 

Median 

0.847 

0.108 

-0.011 

75th  %ile 

0.891 

0.533 

0.467 

25th  %ile 

0.750 

-0.277 

-0.524 

that  will  be  encountered  in  any  optimization  experiment  where  a relatively  small 
amount  of  data  is  generated. 
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Figure  5.2:  Location  of  Optimal  Settings. 


CHAPTER  6 

CONCLUSION  AND  FURTHER  RESEARCH 
The  probability  method  has  been  shown  to  work  well  for  the  situations  of 
robust  parameter  design,  multiple  response  optimization,  and  multiple  response 
robust  parameter  design.  In  each  case,  it  performs  as  well  as  or  better  than  the 
methods  that  have  been  previously  proposed.  The  analyses  have  also  shown  some 
serious  potential  problems  with  some  of  the  other  proposed  methods.  The  proposed 
probability  method  is  a unified  and  intuitive  approach  to  the  optimization  problems 
that  have  been  presented  here. 

The  analysis  that  has  been  done  does  point  to  some  areas  in  the  field  of 
optimization  that  still  need  to  be  addressed.  Perhaps  the  most  important  area 
that  needs  to  be  looked  at  has  to  do  with  the  repeatability  of  any  optimization 
experiment.  As  has  been  shown,  a single  optimization  experiment  may  give  an 
optimal  setting  that  is  far  from  the  true  optimum.  This  is  a fact  that  experimenters 

must  be  aware  of  before  they  begin  an  such  an  experiment.  Much  cost  and  effort 

* 

could  go  into  an  experiment  that  returns  a suggested  combination  of  the  control 
settings  that  is  far  from  optimal.  Such  a situation  could  lead  some  to  become 
skeptical  of  not  only  optimization  experiments,  but  other  statistical  techniques, 
as  well.  For  this  reason,  there  needs  to  be  more  research  on  the  repeatability  of 
optimization  experiments,  and  possible  ways  to  make  them  more  repeatable.  It 
is  unclear  if  there  are  experimental  designs  that  would  lead  to  more  repeatable 
experiments,  or  if  there  are  conditions  on  the  responses  that  must  be  met  in  order 
for  the  repeatability  to  be  more  acceptable.  There  may  also  be  other  analysis 
techniques  that  could  make  the  repeatability  better,  such  as  using  prior  production 
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data  in  a Bayesian  manner  to  improve  prediction,  or  using  confirmation  experiments 
to  either  verify  the  results  of  the  first  experiment  or  to  improve  the  prediction  in  the 
first  experiment. 

A single  optimization  experiment  will  probably  not  result  in  the  true  optimal 
settings  of  the  control  variables.  The  experiment  should,  however,  result  in  settings 
of  the  control  variables  that  are  close  to  the  optimal  settings,  and  methods  should 
be  explored  that  will  most  easily  find  these  optimal  settings.  This  should  be  part 
of  a method  of  continuous  improvement,  where  the  process  is  constantly  trying  to 
be  improved.  Continuous  improvement  will  also  help  to  avoid  problems  that  may 
arise  from  natural  changes  to  the  process,  such  as  equipment  wear,  changes  in  raw 
materials,  etc. 

Though  most  of  the  analyses  that  were  performed  here  relied  on  normality, 
this  assumption  may  be  able  to  be  relaxed.  Since  the  normality  was  only  used  to 
simplify  the  calculation  of  the  probability,  it  is  conceivable  that  other  distributional 
assumptions  could  be  used.  In  fact,  a Weibull  distribution  was  used  in  an  example 
for  robust  parameter  design.  If  other  distributions  can  be  used  in  the  univariate 
case,  they  may  also  be  able  to  be  used  in  the  multivariate  cases. 


APPENDIX  A 

SOLVER  COMMAND  IN  MICROSOFT  EXCEL 
The  following  is  a description  of  how  to  use  Microsoft  Excel  to  find  the  levels 
of  the  control  variables  that  will  maximize  the  probability  that  the  response  of 
interest  is  within  its  specification  limits.  This  description  is  for  robust  parameter 
design  situations  where  there  is  only  on  response  of  interest  whose  variance  depends 
on  the  levels  of  the  control  variables.  The  situations  where  there  are  multiple 
responses  is  covered  in  appendix  B.  The  numbers  used  below  are  from  the  printing 
press  example  given  in  chapter  3.  The  following  description  also  contains  information 
on  the  other  methods  that  have  been  used  to  analyze  this  data  set. 

Figures  A.l  and  A. 2 show  the  spreadsheet  and  the  solver  command  window. 
The  spreadsheet  has  the  optimal  solutions  for  the  proposed  probability  method, 
the  constrained  optimization  method  of  Vining  and  Myers,  the  squared  error  loss 
method  of  Lin  and  Tu,  and  the  constrained  optimization  method  of  Copeland  and 
Nelson  for  two  choices  of  A.  In  each  of  the  methods,  the  cell  below  the  cell  labeled 
’’Mean”  is  calculated  from  equation  3.1,  with  the  xi,X2,and  Xs  replace  by  the 
appropriate  cell.  Similarly  the  cell  below  ”Std  Dev”  is  calculated  from  equation  3.2. 
Similarly,  all  of  the  methods  calculate  the  distance  from  the  origin  to  the  point  x , 
which  is  in  the  cell  under  ’’Radius.”  For  example,  in  the  area  for  the  probability 
method,  the  mean  in  cell  DIO  is  given  by 

=327.62963  + 177  * A10  + 109.42593  * BIO  + 131.46296  * CIO  (A.l) 

+ 32  * A10  * A10  - 22.38889  * BIO  * BIO  - 29.05556  * CIO  * CIO 
+ 66.027778  * A10  * BIO  + 75.472222  * A10  * CIO  + 43.583333  * BIO  * CIO 
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Figure  A.l:  Microsoft  Excel  Spreadsheet. 


and  standard  deviation  in  cell  Dll  is  given  by 

=34.883248  + 11.526786  * 410  + 15.323036  * BIO  + 29.190296  * CIO  (A.2) 

+ 4.2037439  * 410  * 410  - 1.31585  * BIO  * B10  + 16.777879  * CIO  * CIO 
+ 7.7194614  * 410  * B10  + 5.1092608  * 410  * CIO  + 14.081718  * B10  * CIO. 

The  radius  given  in  cell  E10  is  calculated  by 

= SQRT(A10  * 410  + B10  * B10  + CIO  * CIO). 

Finding  the  optimal  point  for  the  probability  method  begins  by  calculating 
the  probability  of  being  within  specifications.  The  probability  of  being  between  the 
lower  limit  (cell  A6)  and  the  upper  limit  (cell  C6)  is  calculated  in  cell  D6  by 

= NORM DIST(C0,  DIO,  E10,  TRUE)  - NORM DIST(A0,  D10,  E10,TRUE). 
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Figure  A. 2:  Window  for  Solver  Command. 

The  function  NORMDIST  gives  the  cumulative  distribution  of  a given  value  for  a 
normal  function  with  a given  mean  and  standard  deviation.  In  order  to  maximize 
the  probability  calculated  in  cell  D6,  the  user  should  first  enter  starting  values  of 
a;i,x2,and  x3  into  cells  A10,  BIO,  and  CIO,  respectively.  Then  the  user  selects  the 
” Solver”  option  from  the  ’’Tools”  menu.  Selecting  ’’Solver”  will  open  up  the  window 
given  in  Figure  A. 2.  The  target  cell  is  the  probability  calculated  in  D6,  which  is 
what  is  to  be  maximized.  The  maximization  is  to  be  done  by  changing  cells  A10, 
BIO,  and  CIO,  which  can  be  entered  as  ”$A$10:$C$10.”  A constraint  can  be  added 
which  keeps  the  search  within  a radius  of  1,  which  is  entered  as  ”$F$10<=1.”  When 
all  of  this  is  entered,  click  on  the  button  labeled  ’’Solve,”  and  the  algorithm  will 
find  the  location  of  the  control  variables  x that  maximizes  the  probability  of  being 
within  specification. 

The  other  methods  are  solved  in  a similar  manner.  The  Vining  and  Myers 
method  uses  the  solver  command  in  a manner  similar  to  that  of  the  probability 
method.  In  this  case,  however,  the  target  cell  is  E18,  the  standard  deviation  of  the 
response,  which  should  be  minimized.  An  addition  constraint  is  added  that  keeps 
the  mean  in  cell  D18  at  500.  The  method  of  Lin  and  Tu  needs  to  calculate  a loss  in 
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cell  B22,  which  is  given  by 

= (£>26  - A22)  * (£>26  - A22)  + £26  * £26, 

or  the  bias  squared  plus  the  variance.  With  this  method,  the  target  cell  is  the  loss 
in  cell  B22,  and  it  is  minimized.  For  the  method  of  Copeland  and  Nelson,  a delta 
is  calculated  in  cell  16  (or  cell  114),  which  is  the  absolute  value  of  the  difference  of 
the  mean  in  cell  K10  (or  cell  K18)  from  the  target  mean  of  500  in  cell  H6  (or  cell 
H14).  In  this  method,  the  target  cell  is  the  standard  deviation  in  cell  L10  (or  cell 
L18),  which  is  to  be  minimized.  An  additional  constraint  is  that  the  delta  calculated 
in  cell  16  (or  cell  114)  be  less  than  the  given  maximum  delta  given  in  cell  J6  (or 
cell  J14).  For  each  of  the  methods,  the  probability  of  being  within  specifications  is 
calculated,  and  this  is  done  in  the  same  fashion  as  for  the  probability  method. 


APPENDIX  B 

FORTRAN  CODE  FOR  MULTIVARIATE  NORMAL  PROBABILITY 
The  following  code  was  used  to  find  the  maximum  probability  of  being  within 
specification  for  the  multiple  response  optimization  problem.  It  also  calculates  the 
probability  of  being  within  specifications  for  the  solutions  that  the  other  methods 
have  suggested.  The  subroutine  mulnor  makes  up  a majority  of  the  program,  and  it 
can  be  found  on  the  internet  web  page  for  StatLib  at  http://lib.stat.cmu.edu.  The 
subroutine  is  algorithm  195  from  the  Applied  Statistics  section.  The  numbers  in  the 
example  below  are  from  the  tire  tread  compound  experiment  described  in  chapter  4. 
program  exl 
use  msimsl 
USE  PORTLIB 

integer  N,  ny  ,xl,x2,x3 
parameter (ny=4) 
parameter  (N=3) 

dimension  Xds(M) ,Xkc(N) ,Xamesl(N) ,Xames2(N) ,y(ny) ,Xmax(N) ,vining(N) 

data  Xds/-0 . 050,0 . 145,-0 .868/ 

data  Xkc/0. 534, 1.536,-0. 148/ 

data  Xamesl/-0 . 461 , -0.283,  -0.528/ 

data  Xames2/0.379,  0.548,  -0.640/ 

data  vining/0.073,  0.408,  -0.549/ 

INTEGER  IPARAMC7),  ITP,  L,  N0UT,  MAXFCN 

REAL  F,  FSCALE,  RPARAM(7) , X(N) , XGUESS(N),  XLB(N) , XSCALE(N) 

REAL  XUB(N),  FT0L  ,maxF 
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integer  tempi,  temp2 
REAL (8)  elapsed_time 
EXTERNAL  mprob 

! XGUESS  is  the  starting  point  of  the  search  algorithm 
DATA  XGUESS/ 0.5,  0.5,  -0.5/ 
data  XSCALE/1 .0,1.0,  1.0/,  FSCALE/1.0/ 

! XLB  and  XUB  are  the  lower  and  upper  bounds  of  the  search 
DATA  XLB/-1. 633, -1.633, -1.633/,  XUB/1 . 633, 1 . 633, 1 . 633/ 

I TP  = 0 

! Default  parameters  are  used 
IPARAM(l)  = 0 
MAXFCN=5000 
FT0L=1 . OE-5 
elapsed_time  = TIMEFO 

i 

! BCP0L  is  a Nelder-Mead  type  search  algorithm 
! Starting  at  XGUESS,  it  returns  the  value  X that  minimizes  the 
! function  mprob  (a  subroutine  contained  in  this  program) . The 
! value  F is  the  value  of  the  function  at  the  minimum. 

i 

! Note  that  since  the  probability  it  to  be  maximized,  the 
! subroutine  mprob  returns  the  negative  of  the  true  probability. 

i 

CALL  BCP0L  (mprob,  N,  XGUESS,  ITP,  XLB,  XUB,  FT0L,  MAXFCN , X,  F) 
elapsed_time  = TIMEFO 
print*, X ( 1 ) ,X(2) ,X(3) , MAXFCN, F 
call  evaluate (X,y) 
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print*,y(l),  y(2) , y(3),  y(4) 
print* , elapsed_time 

! The  probability  is  calulated  for  the  Derringer-Suich  method, 
print* , "Derringer-Suich" 
call  mprob(N,  Xds,  F) 
print*, F 

call  evaluate (Xds ,y) 
print*, y(l),  y(2),  y(3) , y(4) 

! The  probability  is  calulated  for  the  Khuri-Conlon  method, 
print*, ’Khuri-Conlon’ 
call  mprob(N,  Xkc,  F) 
print*, F 

call  evaluate (Xkc, y) 
print*, y(l),  y(2),  y(3) , y(4) 

! The  probability  is  calulated  for  the  Ames,  et  al.  method, 
print* , * Ames ’ 
call  mprob(N,  Xamesl,  F) 
print* ,F 

call  evaluate (Xamesl ,y) 
print*, y(l),  y(2),  y(3) , y(4) 

! The  probability  is  calulated  for  the  Vining  method, 
print* , ’Vining’ 
call  mprob(N,  vining,  F) 
print* ,F 

call  evaluate (vining, y) 
print*, y(l),  y(2),  y(3) , y(4) 


i 
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The  following  will  find  the  value  of  the  probability  for  a 
grid  of  points  in  the  experimental  region.  This  data  file 
can  be  exported  and  used  to  generate  contour  plots  of  the 
probability  in  the  region  of  interest.  It  may  be  eliminated 
if  it  is  not  needed. 

open (3 , FILE= ; probnl . dat ; ) 
elapsed_time  = TIMEFO 
temp2=163**2 
maxF=0 . 0 
do  xl=-16,16 
do  x2=-16 , 16 

templ=xl**2+x2**2 
if  (templ<temp2)  then 
do  x3=-16,16 

if  (templ+x3**2<temp2)  then 
X(l)=real(xl)/10 
X(2)=real(x2)/10 
X(3)=real(x3)/10 
call  mprob(N,X,F) 

F=(-1.0)*F 
if  (F>maxF)  then 
maxF=F 
Xmax(l)=X(l) 

Xmax(2)=X(2) 

Xmax(3)=X(3) 


end  if 


end  if 


write  (3, ; (F10.5,A,F10.5,A,F10.5,A,F10.5) ; ) X ( 1 ) , 
& ,X(2),>  >,X(3),'  >,F 

end  do 
end  if 
end  do 
end  do 

elapsed_time  = TIMEF () -elapsed_time 
print* ,Xmax(l) ,Xmax(2) ,Xmax(3) ,maxF 
call  evaluate (Xmax,y) 
print* ,y(l) , y(2),  y(3) , y(4) 
print* , elapsed_time 
close (3) 
end 
! 

! This  is  the  start  of  the  subroutines 
SUBROUTINE  mprob  (N,  X,  F) 

! This  returns  the  probability  F at  a point  X. 

! 

! The  key  is  the  subroutine  mulnor,  which  calculates  the 
! probability  of  being  within  a rectangular  region,  assuming 
! a multinormal  distribution. 

i 

INTEGER  N 
REAL  X(N) , F 
parameter  (ny=4) 
integer  i 
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dimension  a(ny) ,b(ny) ,sig(ny*(ny-l)/2) ,y(ny) ,yu(ny) ,yl(ny) 

dimension  stdev(ny) 

integer  inf (ny) , if ault 

real  prob, bound, eps 

data  eps/0.0001/ 

! sig  gives  the  upper  triangular  portion  of  the  correlation  matrix 
data  sig/0. 01885893,  -0.02720418,  -0.22046146,  0.15876509, 

& 0.072886,  -0.05203539/ 

! inf  gives  the  types  of  bounds  on  the  responses: 

! 0 is  used  if  the  upper  bound  is  infinity 

! 1 is  used  if  the  lower  bound  is  infinity 

! 2 is  used  if  both  bounds  are  specified 

data  inf /0, 0,2, 2/ 

! yl  and  yu  are  the  lower  and  upper  bounds  of  the 
! rectangular  region 

data  yl  /120 . 0, 1000 . 0,400. 0 ,60 . 0/ , yu  /500.0, 1500.0,600.0,75.0/ 

! Since  the  individual  responses  are  assumed  to  have  a variance  of 
! 1,  they  must  be  scaled  by  the  standard  deviations, 
data  stdev  /5.6112,  328.69,  20.549,  1.2674/ 
call  evaluate (X,y) 
do  i=l,ny 

a(i)=(yu(i)-y(i) ) /stdev (i) 
b ( i)  = (yl  ( i)  — y ( i) ) /stdev  (i) 
end  do 

call  mulnor (a, b, sig, eps ,ny, inf , prob, bound, if ault) 

F=prob  * (-1) 


end 
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subroutine  evaluate (x,y) 

! This  evaluates  the  Derringer-Suich  fit  for  a given  x. 
dimension  y(*),x(*) 

y Cl)  = 139 . 1192387+16 . 4936447 *x ( 1)  + 17 . 8807 651*x (2)  + 

& 10 . 9065385*x(3) -4 . 0096009*x ( 1) **2-3 . 4471056*x (2) **2 

y Cl)  = y ( 1 ) — 1 - 5721213*x(3) **2+5 . 125*x(l)*x(2)+7 . 125*x(l) *x(3) 

& +7 . 875*x(2) *x(3) 

y (2)  = 1261. 133138+268. 151102*x(l) +246. 503174*x(2) +139. 484533*x(3) 
& -83 . 565885*x ( 1) **2-124 . 815539*x (2) **2 

y (2)  = y(2)+199.181747*x(3)**2+69.375*x(l)*x(2)+94.125*x(l)*x(3) 

& +104 . 375*x(2) *x(3) 

y (3)  = 400 . 3845754-99 . 6664161*x (1) -31 . 3963948*x (2) -73 . 9190024*x (3) 
& +7 . 9326889*x (1) **2+17 . 3076104*x (2) **2 

y (3)  = y(3)+0.4327517*x(3)**2+8.75*x(l)*x(2)+6.25*x(l)*x(3) 

& +1 . 25*x(2) *x(3) 

y (4)  = 68 . 90961521-1 . 40984528*x (1) +4 . 31968553*x (2) +1 . 63484452*x (3) 
& +1 . 55768153*x (1) **2+0 . 05769409*x (2) **2 

y (4)  = y (4)-0 . 31730277*x(3) **2-1 . 625*x(l) *x(2)+0 . 125*x(l) *x(3) 

& -0.25*x(2)*x(3) 


end 


The  following  is  the  Fortran  program  that  was  used  to  generate  the  simulation 


results  for  the  repeatability  of  multiple  response  optimization.  The  subroutines  of 
the  program  that  are  identical  to  the  previous  program  are  not  included, 
program  DSsim 


use  msimsl 


USE  P0RTLIB 
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integer  N,  NOBS  , NIND  ,LDX,NC0EF, INTCEP 

integer  M,  ME, IPRINT, IERSVR,  IPACT,  ISACT,  ICODE 

parameter  (INTCEP=1,  NIND=9,  N0BS=20,  LDX=N0BS,  NCOEF=INTCEP+NIND) 

parameter  (N=4,  M=2,  ME=0,  IPRINT=0,  IERSVR=0,  IPACT=-1,  ISACT=0) 

real  X(3),y(N),  yvec(NOBS),  R(N) 

real  stdev(N)  ,xdesign(20,3)  ,B(NC0EF),  SST,  SSE 

real  XGUESS(3),  XLB(3),  XUB(3),  F ,F1 ,F2,F3,FT0L 

real  XG1(3),  XG2(3) 

real  XSCALE(3) , FSCALE,  RPARAM(7)  ,minF 

common  /fitted/  bmatrix(10,4) 

common  /sighat/  sig(6) , sstmat (4) , ssemat(4) 

common  /mstuff/  siginv(4,4),  xpxinv(10 , 10) 

common  /actual/  ymat(20,4),  xmatrix(20,9) 

integer  i,j,  k,  ITP,  IPARAM(7)  ,MAXFCN,  loopiter 

EXTERNAL  ds,  mprob,  getmaxyl,  getmaxy2,  kc,  dssphere,  mprobsp 

data  stdev  /5 .61 ,328 . 69 ,20 . 55, 1 . 27/ 

DATA  XSCALE/ 1.0, 1.0, 1.0/,  FSCALE/ 1 . 0E0/ 

DATA  XLB/-1. 633, -1.633, -1.633/,  XUB/1 . 633, 1 . 633, 1 . 633/ 

This  gets  the  design  points  and  generates  the  design  matrix 
call  f illxdes (xdesign) 
call  fillxmat (xdesign) 

call  RNOPT (7) 
call  RNSET(O) 


Open  a file  to  put  all  of  the  solutions  into 
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open (3 , FILE= ’ solution . dat ’ ) 

Select  how  many  iterations  of  the  simulation  to  perform 
do  ii=l,1000 

print* ,’ Iteration  = ’,ii 

This  fills  the  y matrix  with  randomly  generated  data 
do  i=l ,20 
do  j=l ,3 

x(j)=xdesign(i, j) 
end  do 

call  fittedy(x.y) 

CALL  RNNOR  (N,  R) 
do  j=l ,4 

ymat (i , j ) =y ( j ) +R( j ) *stdev ( j ) 
end  do 
end  do 

Now  we  are  going  to  fit  all  of  the  data  to  a full  second 
order  polynomial, 
do  j=l ,4 
do  i=l,20 

yvec(i)=ymat(i, j) 
end  do 

CALL  RLSE  (NOBS,  yvec,  NIND,  xmatrix,  LDX,  INTCEP,  B,  SST,  SSE) 
do  i=l,  NCOEF 


bmatrix(i, j)=B(i) 
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end  do 

sstmat (j)=SST 
ssemat (j)=SSE 
end  do 
call  getsig 

Now  to  see  if  we  can  maximize  the  whole  thing 

First  using  Derringer  and  Suich's  method 

minF=l . 0 

ytar(l)=0.0 

ytar(2)=0.0 

This  finds  a good  starting  point,  which  is  used  for  all  methods, 
do  i=-10, 10 

x(l)=real(i)/10.0 
do  j=-10 , 10 

x(2)=real(j)/10.0 
do  k=-10,10 

x(3) =real (k) / 10 . 0 
call  ds(N,  x,  F) 
call  gety(x,y) 
if  (F  . LT.  minF)  then 
minF=F 

XGUESS(l)=x(l) 

XGUESS(2)=x(2) 


XGUESS(3)=x(3) 
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end  if 

if  (y ( 1 ) .GT.  ytar(l))  then 
ytar(l)=y(l) 

XGl(l)=x(l) 

XG1 (2)=x(2) 

XG1 (3)=x(3) 
end  if 

if  (y (2)  .GT.  ytar(2))  then 
ytar(2)=y(2) 

XG2(l)=x(l) 

XG2(2)=x(2) 

XG2(3)=x(3) 
end  if 
end  do 
end  do 
end  do 

Occasionally  the  starting  point  is  too  close  to  the  maximum, 
so  the  maximization  algorithm  fails.  If  this  is  the  case, 
the  starting  point  is  perturbed  a bit  and  the  maximization 
is  tried  again. 
loopiter=0 

do  while  (loopiter  .LT.  3) 

CALL  ERSET  (IERSVR,  IPACT,  ISACT) 

MAXFCN=5000 


This  algorithm  returns  the  maximum  within  the  spherical 


no 


! region  of  interest.  The  optimal  point  is  x,  with  the 
! desirability  given  as  F.  The  desirability  is  calculated  in 
! the  subroutine  dssphere. 

CALL  NCONF  (dssphere,  M,  ME,  N,  XGUESS,  ITP,  XLB,  XUB, 

& XSCALE,  IPRINT,  MAXFCN , x,  F) 

ICODE  = IERCD ( ) 

if  ( (x ( 1 ) .EQ.  XGUESS (1) ) .or.  (ICODE  .EQ.  2))  then 
do  i=l,3 

XGUESS ( i ) =XGUESS ( i ) +0 . 1 
end  do 
else 

loopiter=3 
end  if 

loopiter=loopiter+l 
F= (-1 . 0) *F 
print*,x,F,  ICODE 
end  do 

! The  following  gives  additional  information  for  the  point  x. 
call  mprob(N,  x,  FI) 

Fl= (-1 . 0) *F1 

call  trueprob(N,  x,  F2) 

F2= (-1 . 0) *F2 

call  trueds(N,  x,  F3) 

F3= (-1 . 0) *F3 
call  gety(x,y) 

! And  it  is  all  written  to  a file. 
write(3,99)  x,y,F,  F3,  F2,  ICODE 
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! Now  try  the  probability  method 
IPARAM(l)  = 0 
loopiter=0 

! A few  tries  are  allowed  for  the  same  reason  as  above, 
do  while  (loopiter  .LT.  3) 

CALL  ERSET  (IERSVR,  IPACT,  ISACT) 

MAXFCN=5000 

! Same  as  above.  The  subroutine  mprobsp  returns  the 
! probability,  which  is  maximized. 

CALL  NCONF  (mprobsp,  M,  ME,  N,  XGUESS,  ITP,  XLB,  XUB, 
& XSCALE,  IPRINT,  MAXFCN,  x,  F) 

ICODE  = IERCD ( ) 

if  ( (x(l)  .EQ.  XGUESS (1) ) .or.  (ICODE  .EQ.  2))  then 
do  i=l,3 

XGUESS ( i ) =XGUESS ( i ) +0 . 1 
end  do 
else 

loopiter=3 
end  if 

loopiter=loopiter+l 
F=(-1.0)*F 
print*,x,F,  ICODE 
end  do 

! Again,  more  details  about  the  point  x. 
call  trueprob(N,  x,  F3) 

F3= (-1 . 0) *F3 
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call  gety(x,y) 

! And  write  to  file. 
write(3,98)  x,y,F,  F3,  FI,  ICODE 
end  do 
close (3) 

99  FORMAT ( ’ DS ’ ,3F10.5,2X,4F15.4,2X,3F10.5,I6) 

98  FORMAT ( ’ME’ , 3F10 . 5 , 2X.4F15 . 4, 2X, 3F10 . 5 , 16) 
end 
! 

subroutine  fittedy(x.y) 

! This  is  the  assumed  true  functional  relationship 
! between  the  responses  and  the  control  factors, 
real  x(3) ,y (4) 

y(l)=139 . 1192387+16.4936447*x(l)+17.8807651*x(2)+10.9065385*x(3) 
y(l)=y(l)-4.0096009*x(l)**2-3.4471056*x(2)**2-1.5721213*x(3)**2 
y(l)=y(l)+5.125*x(l)*x(2)+7.125*x(l)*x(3)+7.875*x(2)*x(3) 
y (2) =1261. 133138+268. 151102*x(l) +246. 503174*x(2)+139.484533*x (3) 
y (2)=y (2)-83 . 565885*x(l) **2-124.815539*x(2) **2+199 . 181747*x(3) **2 
y (2)=y (2)+69 . 375*x(l) *x(2)+94 . 125*x(l) *x(3)+104.375*x(2)*x(3) 
y (3) =400 . 3845754-99 . 6664161*x (1) -31 . 3963948*x (2) -73 . 9190024*x (3) 
y (3)=y (3)+7 . 9326889*x(l) **2+17. 3076104*x(2) **2+0 .4327517*x(3) **2 
y(3)=y(3)+8.75*x(l)*x(2)+6.25*x(l)*x(3)+1.25*x(2)*x(3) 
y (4) =68 . 90961521-1 . 40984528*x (1) +4 . 31968553*x (2) +1 . 63484452*x (3) 
y(4)=y(4)+1.55768153*x(l)**2+0.05769409*x(2)**2-0.31730277*x(3)**2 
y(4)=y(4)-1.625*x(l)*x(2)+0.125*x(l)*x(3)-0.25*x(2)*x(3) 
end 


i 


subroutine  gety(X,  Y) 

! This  returns  the  values  of  the  responses,  based 
! on  the  fitted  equations  that  were  generated  from 
! the  simulated  data, 
real  X(3),  Y(4) 
integer  i 

common  /fitted/  bmatrix(10,4) 
do  i=l,  4 

Y ( i) =bmatrix ( 1 , i) +bmatrix (2 , i ) *X ( 1) +bmatrix (3 , i) *X (2) + 
& bmatrix(4,i)*X(3) 

Y(i)=Y (i)+bmatrix(5 , i)*X(l) **2+bmatrix(6 , i)*X(2) **2+ 

& bmatrix(7 , i) *X(3) **2 

Y(i)=Y(i)+bmatrix(8,i)*X(l)*X(2)+bmatrix(9,i)*X(l)*X(3) 
& bmatrix(10,i)*X(2)*X(3) 

end  do 
end 
! 

subroutine  getsig 

! This  estimates  the  sigma  hat  matrix,  which  must  be 
! done  for  each  simulated  set  of  data.  It  is 
! complicated,  but  it  does  work  correctly, 
common  /actual/  ymat(20,4),  xmatrix(20,9) 
common  /mstuff/  siginv(4,4),  xpxinv(10, 10) 
common  /sighat/  sig(6)  ,sstmat(4),  ssemat(4) 
real  xmat (20 , 10) .tempi (20 ,20) , temp2(20, 10) , temp3(4,20) 
real  xpx(10,10),  sigma(4,4) ,sigmal(4,4) ,sse(4,4) 
integer  i,  j,  k,  NRA,  NCA,  LDA,  NB,  LDB,  NCY 
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data  NRA  /20/,  NCA  /10/,  LDA  /20/,  NB  /10/,  LDB  /10/,  NCY  /4/ 
do  i=l,20 

xmat (i , 1)=1 .0 
do  j=l ,9 
k=j+l 

xmat ( i , k) =xmatr ix ( i , j ) 
end  do 
end  do 

CALL  MXTXF  (NRA,  NCA,  xmat,  LDA,  NB,  xpx,  LDB) 

CALL  LINRG  (NB,  xpx,  LDB,  xpxinv,  LDB) 

CALL  MRRRR (NRA , NCA , xmat , LDA , NB , NB , xpxinv , LDB , NRA , NB , temp2 , LDA) 

CALL  MXYTF ( NRA , NCA , t emp2 , LD A , NRA , NC A , xmat , LDA , NRA , NRA , t emp 1 , LD A ) 
do  i=l ,20 
do  j=l , 20 

if  (i  .EQ.  j)  then 

tempi (i, j)=1.0-templ(i, j) 
else 

tempi (i, j)=templ(i, j)*(-1.0) 
end  if 
end  do 
end  do 

CALL  MXTYF (NRA , NCY , ymat , LDA , NRA , NRA .tempi , LDA , NCY , NRA , temp3 , NCY) 
CALL  MRRRR (NCY , NRA , temp3 , NCY , NRA , NCY , ymat , LDA , NCY , NCY , s igma , NCY ) 
do  i=l  ,4 
do  j=l ,4 

sigma (i , j)=sigma(i , j ) / 10 .0 
if  (i  .EQ.  j)  then 
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sse(i, j)=1.0/sqrt(sigma(i, j)) 
else 

sse(i, j)=0.0 
end  if 
end  do 
end  do 

CALL  LI NRG  (NCY,  sigma,  NCY,  siginv,  NCY) 

CALL  MRRRR (NCY , NCY , sse , NCY , NCY , NCY , sigma , NCY , NCY , NCY , sigmal , NCY) 
CALL  MRRRR (NCY , NCY , sigmal , NCY , NCY , NCY , sse , NCY , NCY , NCY , sigma , NCY) 
! This  gives  the  upper  triangular  portion  of  the 
! correlation  matrix,  which  is  needed  to  calculate 
! the  probability  of  interest. 
sig(l)=sigma(2, 1) 
sig(2)=sigma(3, 1) 
sig(3)=sigma(3,2) 
sig(4)=sigma(4, 1) 
sig(5)=sigma(4,2) 
sig(6)=sigma(4,3) 
end 

i 

SUBROUTINE  mprob  (N,  X,  F) 

! This  returns  the  probability  based  on  the 
! simulated  data. 

INTEGER  N 

REAL  X(N) , F 

parameter  (ny=4) 
integer  i 
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dimension  a(ny) ,b(ny) ,y(ny) ,yu(ny) ,yl(ny) 

common  /sighat/  sig(6)  ,sstmat(4),  ssemat(4) 

dimension  stdev(ny) 

integer  inf (ny) , if ault 

real  prob, bound, eps 

data  eps/0.0001/ 

data  inf /0, 0,2, 2/ 

data  yl  /120.0, 1000. 0,400. 0,60.0/, yu  /500.0, 1500.0,600.0,75.0/ 
do  i-1,4 

stdev (i) =sqrt (ssemat (i) / 10.0) 
end  do 

call  gety(X,y) 
do  i=l,ny 

a(i)=(yu(i)-y(i))/stdev(i) 
b (i)  = (yl  ( i)  -y  (i) ) /stdev  (i) 
end  do 

call  mulnor (a, b,sig, eps ,ny, inf , prob, bound, if ault) 

F=prob  * (-1) 
end 

i 

SUBROUTINE  mprobsp  (M,  ME,  N,  X,  ACTIVE,  F,  G) 

! This  is  used  in  the  maximization  algorithm.  It  is 
! basically  the  same  as  mprob,  but  adds  the  nonlinear 
! constraints  that  the  solution  must  be  within  a 
! radius  of  1.633. 

INTEGER  N,  M,  ME 

X(*) , F,  G(*) , dist 


REAL 
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logical  ACTIVE (*) 
parameter  (ny=4) 
integer  i 

dimension  a(ny) ,b(ny) ,y(ny) ,yu(ny) ,yl(ny) 

common  /sighat/  sig(6)  ,sstmat(4),  ssemat(4) 

dimension  stdev(ny) 

integer  inf (ny) , if ault 

real  prob, bound, eps 

data  eps/0.0001/ 

data  inf /0, 0,2, 2/ 

data  yl  /120 . 0 , 1000 . 0,400 . 0,60 . 0/ ,yu  /500. 0,1500. 0,600. 0,75.0/ 
do  i=l,4 

stdev(i) =sqrt (ssemat (i) / 10.0) 
end  do 

call  gety(X,y) 
do  i=l,ny 

a(i)=(yu(i)-y(i))/stdev(i) 
b ( i) = (yl ( i) — y ( i) )/stdev(i) 
end  do 

call  mulnor (a, b,sig, eps ,ny, inf , prob, bound, if ault) 

F=prob  * (-1) 

dist=sqrt (X ( 1 ) **2+X(2) **2+X(3) **2 ) 
if  (ACTIVE (1) ) G(l)=dist 
if  (ACTIVEC2))  G (2) =1 . 633-dist 
end 

i 


SUBROUTINE  trueprob  (N,  X,  F) 
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! This  returns  the  probability  based  on  the  assumed 
! fit  obtained  from  the  original  Derringer-Suich  data. 

INTEGER  N 

REAL  X (N)  , F 

parameter  (ny=4) 
integer  i 

dimension  a(ny) ,b(ny) ,y(ny) ,yu(ny) ,yl(ny) 

common  /sighat/  sig(6)  ,sstmat(4),  ssemat(4) 

dimension  stdev(ny) 

integer  inf (ny) , if ault 

real  prob, bound, eps 

data  eps/0.0001/ 

data  inf/0,0,2,2/ 

data  yl  /120.0, 1000. 0,400. 0,60.0/, yu  /500. 0,1500. 0,600. 0,75.0/ 
data  stdev  /5.6112,  328.69,  20.549,  1.2674/ 
call  fittedy(X,y) 
do  i=l,ny 

a(i)=(yu(i)-y (i) ) / stdev (i) 
b ( i)=(yl(i)-y(i)) /stdev (i) 
end  do 

call  mulnor (a , b , sig , eps , n , inf , prob , bound , if ault) 

F=prob  * (-1) 
end 

i 

subroutine  ds(N,  X,  F) 

! This  returns  the  desirability  based  on  the 


! simulated  data. 
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integer  N 

real  X(N) , F,  Y(4) , d 
call  gety(X,Y) 
if  (Y (1)  .LT.  120.0)  then 
d=0 . 0 
else 

if  (Y(l)  .GT.  170.0)  then 
d=l . 0 
else 

d=(Y(l) -120.0)/ (170. 0-120.0) 
end  if 
end  if 

if  (Y (2)  .LT.  1000.0)  then 
d=0 . 0 
else 

if  (Y (2)  .GT.  1300.0)  then 
d=d*l . 0 
else 

d=d* (Y(2) -1000 . 0) / (1300 . 0-1000 . 0) 
end  if 
end  if 

if  ( (Y (3)  .LT.  400.0)  .OR.  (Y(3)  .GT.  600.0))  then 
d=0 . 0 
else 

if  (Y (3)  .LT.  500.0)  then 

d=d* (Y (3) -400 . 0) / (500 . 0-400 . 0) 


else 


d=d* (600 . 0-Y (3) ) / (600 . 0-500 . 0) 
end  if 
end  if 

if  ( (Y (4)  .LT.  60.0)  .OR.  (Y(4)  .GT.  75.0))  then 
d=0 . 0 
else 

if  (Y (4)  .LT.  67.5)  then 
d=d* (Y (4) -60 . 0) / (67 . 5-60 . 0) 
else 

d=d* (75 . 0-Y (4) ) / (75 . 0-67 . 5) 
end  if 
end  if 

d=sqrt (sqrt (d) ) 

F=d* (-1.0) 
end 

i 

subroutine  dssphere(M,  ME,  N,  X,  ACTIVE,  F,  G) 

! This  is  used  by  the  maximization  algorithm. 

! It  returns  the  desirability  based  on  the 
! simulated  data,  with  the  added  nonlinear 
! constraint  that  the  solution  must  be  within 
! a radius  of  1.633. 
integer  N,  M,  ME 

real  X(*) , F,  G(*),  Y(4),  d,  dist 
logical  ACTIVE (*) 
call  gety(X,Y) 
if  (Y (1)  .LT.  120.0)  then 
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d=0 . 0 
else 

if  (Y(l)  .GT.  170.0)  then 
d=l . 0 
else 

d=(Y(l) -120.0)/ (170. 0-120.0) 
end  if 
end  if 

if  (Y (2)  .LT.  1000.0)  then 
d=0 . 0 
else 

if  (Y (2)  .GT.  1300.0)  then 
d=d*l . 0 
else 

d=d* (Y (2) -1000 .0)/ (1300 . 0-1000 . 0) 
end  if 
end  if 

if  ( (Y (3)  .LT.  400.0)  .OR.  (Y(3)  .GT.  600.0))  then 
d=0 . 0 
else 

if  (Y (3)  .LT.  500.0)  then 

d=d* (Y (3) -400 . 0) / (500 . 0-400 . 0) 
else 

d=d* (600 . 0-Y (3) ) / (600 . 0-500 . 0) 
end  if 
end  if 

if  ( (Y (4)  .LT.  60.0)  .OR.  (Y(4)  .GT.  75.0))  then 
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d=0 . 0 
else 

if  (Y (4)  .LT.  67.5)  then 
d=d* (Y (4) -60 . 0) / (67 . 5-60 . 0) 
else 

d=d* (75 . 0-Y (4) ) / (75 . 0-67 . 5) 
end  if 
end  if 

d=sqrt (sqrt (d) ) 

F=d* (-1 . 0) 

dist=sqrt (X(l) **2+X(2) **2+X(3) **2) 
if  (ACTIVE (1) ) G(l)=dist 
if  (ACTIVE (2) ) G(2)=l .633-dist 
end 

i 

subroutine  trueds(N,  X,  F) 

! This  returns  the  true  desirability  obtained 
! from  the  fit  of  the  original  Derringer  and 
! Suich  data, 
integer  N 

real  X(N),  F,  Y(4),  d 
call  fittedy(X,Y) 
if  (Y (1)  .LT.  120.0)  then 
d=0 . 0 
else 

if  (Y (1)  .GT.  170.0)  then 


d=l  .0 
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else 

d=(Y(l) -120.0)/ (170. 0-120.0) 
end  if 
end  if 

if  (Y (2)  .LT.  1000.0)  then 
d=0 . 0 
else 

if  (Y (2)  .GT.  1300.0)  then 
d=d*l . 0 
else 

d=d* (Y (2) -1000 . 0) / ( 1300 . 0-1000 . 0) 
end  if 
end  if 

if  ( (Y (3)  .LT.  400.0)  .OR.  (Y(3)  .GT. 

d=0 . 0 
else 

if  (Y (3)  .LT.  500.0)  then 

d=d* (Y (3) -400 . 0) / (500 . 0-400 . 0) 
else 

d=d* (600 . 0-Y (3) ) / (600 . 0-500 . 0) 
end  if 
end  if 

if  ( (Y (4)  .LT.  60.0)  .OR.  (Y(4)  .GT. 

d=0 . 0 
else 

if  (Y (4)  .LT.  67.5)  then 


600.0))  then 


75.0))  then 


d=d* (Y (4) -60 . 0) / (67 . 5-60 . 0) 
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else 

d=d* (75 . 0-Y (4) ) / (75 . 0-67 . 5) 
end  if 
end  if 

d=sqrt (sqrt (d) ) 

F=d* (-1 . 0) 
end 

i 

subroutine  f illxmat (xdesign) 

! Some  of  the  fits  and  estimations  need  the 
! full  design  matrix,  which  this  generates, 
common  /actual/  ymat(20,4),  xmatrix(20,9) 
real  xdesign (20 ,3) 
integer  i 
do  i=l,20 

xmatrix(i , l)=xdesign(i , 1) 
xmatrix(i,2)=xdesign(i,2) 
xmatr ix ( i , 3) =xdesign ( i , 3) 
xmatrix(i ,4)=xdesign(i , 1) **2 
xmatrix(i ,5)=xdesign(i ,2)**2 
xmatr ix (i , 6) =xdesign (i , 3) **2 
xmatrix(i ,7)=xdesign(i , 1) *xdesign(i , 2) 
xmatrix(i ,8)=xdesign(i , 1) *xdesign(i ,3) 
xmatrix(i ,9)=xdesign(i ,2)*xdesign(i ,3) 
end  do 
end 

i 
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subroutine  f illxdes (xdesign) 

! This  puts  the  design  points  into  a matrix. 

! It  should  be  in  a file,  but  this  was  easier 
! given  my  time  constraints, 
real  xdesign(20,3) 
xdesign(l , 1)=-1 . 0 
xdesign (1 ,2)=-l .0 
xdesign(l ,3)=1 . 0 
xdesign(2, 1)  = 1 . 0 
xdesign(2,2)=-l .0 
xdesign(2,3)=-l .0 
xdesign(3, 1)=-1 .0 
xdesign (3 , 2) =1 . 0 
xdesign(3,3)=-l . 0 
xdesign (4, 1)  = 1 . 0 
xdesign(4,2)  = l . 0 
xdesign (4 , 3) =1 . 0 
xdesign (5, 1)=-1 . 0 
xdesign(5,2)=-l  .0 
xdesign(5,3)=-l .0 
xdesign(6,l)=1.0 
xdesign (6, 2)=-l . 0 
xdesign (6 ,3)=1 . 0 
xdesign (7, 1)=-1 . 0 
xdesign (7 , 2)=1 . 0 
xdesign(7,3)=l .0 
xdesign (8, 1)=1 . 0 
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xdesign (8 , 2) =1 . 0 
xdesign(8,3)=-l .0 
xdesign(9, 1)=-1.633 
xdesign (9 ,2) =0 . 0 
xdesign(9 , 3) =0 . 0 
xdesign (10, 1)=1 . 633 
xdesign(10,2)=0.0 
xdesign(10,3)=0.0 
xdesign (11 , 1)=0 . 0 
xdesign (11 ,2) =-l . 633 
xdesign(ll,3)=0.0 
xde  sign(12,l)=0.0 
xdesign(12,2)=l .633 
xdesign ( 12 , 3) =0 . 0 
xdesign (13, 1)=0 . 0 
xdesign(13,2)=0 . 0 
xdesign(13,3)=-l .633 
xdesign( 14 , 1) =0 . 0 
xdesign ( 14 , 2) =0 . 0 
xdesign (14, 3) =1 .633 
xdesign (15, 1)=0.0 
xdesign (15, 2) =0.0 
xdesign ( 15 , 3) =0 . 0 
xde  sign(16,l)=0.0 
xdesign (16, 2) =0.0 
xdesign (16, 3) =0.0 
xdesign (17, 1)=0.0 


xdesign(17,2)=0. 
xdesign(17,3)=0 . 
xdesign(18, 1)=0 . 
xdesign(18,2)=0. 
xdesign(18 ,3)=0 . 
xdesign(19 , 1)=0 . 
xdesign(19,2)=0. 
xdesign(19,3)=0. 
xdesign(20,l)=0. 
xdesign(20 ,2)=0 . 
xdesign (20 , 3) =0 . 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


end 


APPENDIX  C 

BASIC  OUTLINE  OF  AN  OPTIMIZATION  EXPERIMENT 


The  following  is  a general  description  of  all  of  the  steps  necessary  to  carry 
out  an  optimization  experiment,  whether  it  has  one  response  of  interest  or  multiple 
responses  of  interest. 

Select  the  Responses  of  Interest  and  the  Control  Factors 
An  optimization  experiment  begins  by  identifying  the  responses  of  interest 
and  what  control  factors  may  affect  them.  The  responses  of  interest  are  typically 
the  important  quality  characteristics  of  the  product  being  investigated.  Information 
gathered  from  prior  experimentation  or  knowledge  of  the  process  being  investigated 
will  help  to  identify  control  factors  that  affect  the  responses  of  interest.  For  each  of 
the  responses  of  interest,  upper  and  lower  specification  limits  will  also  need  to  be 
chosen. 


Select  an  Appropriate  Experimental  Design 
The  appropriate  experimental  design  depends  on  the  functional  form  that  is 
assumed  to  exist  between  the  responses  of  interest,  the  yi,  and  levels  of  the  control 
factors,  the  Xj.  Most  optimization  experiments  assume  that  the  responses  of  interest 
can  be  estimated  by  second  order  polynomial  models  of  the  control  factors.  This 
model  can  be  written  as 

p p p 

Vi  = A)  T ^ ^ T ^ ^ ^ ' fiijXiXj  + Cj, 

2—1  2—1  j>i 

where  the  f3  are  unknown  regression  parameters,  e is  an  error  term,  and  p is  the 
number  of  control  factors.  If  a second  order  polynomial  model  is  assumed  to  be 
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true,  the  experiment  must  have  at  least  three  levels  of  each  of  the  control  variables. 
Popular  response  surface  experimental  designs  that  can  support  the  fit  of  a second 
order  polynomial  model  are  the  central  composite  design  and  the  33  factorial,  which 
are  detailed  in  books  on  response  surface  methods,  such  as  those  by  Myers  and 
Montgomery  (1995)  or  Khuri  and  Cornell  (1996). 

If  the  variance  of  one  or  more  of  the  responses  also  depends  on  the  levels 
of  the  control  factors,  some  of  the  experimental  points  will  have  to  be  replicated 
in  order  to  obtain  an  estimate  of  the  variance  at  those  points.  The  number  of 
experimental  points  that  need  to  be  replicated  will  depend  on  the  functional  form 
that  is  assumed  exist  between  the  variance  and  the  levels  of  the  control  factors.  A 
good  treatment  of  the  issues  involved  with  this  can  be  found  in  Vining  and  Schaub 
(1996). 

Once  the  form  of  the  experimental  design  is  chosen,  appropriate  levels  of  the 
response  variables  must  be  chosen.  The  choice  of  levels  may  be  made  based  on  prior 
experimentation  or  knowledge  of  the  process  being  investigated.  The  highest  and 
lowest  level  of  the  control  variable  should  be  made  as  far  apart  as  is  feasible,  with 
the  other  levels  determined  by  the  choice  of  the  experimental  design. 

Estimation  of  the  Regression  Parameters 

Once  the  experimental  design  and  appropriate  levels  of  the  control  factors  are 
chosen,  the  experimental  points  can  be  properly  randomized  and  the  experiment  run. 
The  data  from  the  experiment  are  then  used  to  estimate  the  unknown  regression 
parameters  /3.  The  estimates  are  generally  obtained  from  computer  packages,  such 
as  Minitab,  SAS,  Splus,  Microsoft  Excel,  etc.  Any  other  unknown  quantities,  such 
as  the  variance-covariance  matrix,  will  also  be  estimated  from  the  experimental  data 
using  similar  software  packages. 
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Maximizing  the  Probability  of  Being  Within  Specification 
Appendix  A contains  details  for  robust  parameter  design  situations,  where 
there  is  only  one  response  of  interest  whose  variance  depends  on  the  levels  of  the 
control  variables.  Only  minor  changes  to  the  description  given  in  appendix  A would 
be  necessary  for  a new  optimization  experiment.  The  equations  in  appendix  A that 
define  the  mean  and  the  standard  deviation  (equations  A.l  and  A. 2,  respectively) 
would  have  to  match  the  regression  equations  obtained  from  the  new  experiment. 

Appendix  B contains  a Fortran  program  that  can  be  modified  to  solve  the 
problem  for  situations  where  there  is  more  than  one  response  of  interest.  For  the 
multiple  response  optimization  situations,  where  there  is  more  than  one  response 
whose  variances  do  not  depend  on  the  levels  of  the  control  variables,  a number  of 
changes  would  have  to  be  made  to  the  program  given  in  appendix  B.  The  numbers 
given  in  appendix  B match  the  numbers  obtained  from  the  tire  tread  example  given 
in  chapter  4.  Specifically,  the  changes  necessary  would  be: 

• The  parameters  ny  and  N would  have  to  be  specified  correctly  in  the  main 
program.  The  parameter  ny  should  equal  the  number  of  response  variables, 
while  the  parameter  N should  equal  the  number  of  control  variables.  The  ny 
parameter  also  needs  to  be  set  in  the  subroutine  mprob.  The  lines  in  question 
are 

parameter (ny=4) 
parameter  (N=3) 

since  the  example  has  4 responses  of  interest  and  3 control  parameters. 

• The  upper  and  lower  bounds  for  the  control  parameters  are  set  in  a data  line 
for  XLB  and  XIJB  in  the  main  program.  The  line  in  question  is 

DATA  XLB/-1. 633, -1.633, -1.633/,  XUB/1 . 633, 1 . 633, 1 . 633/ 
since  the  limits  of  the  example  are  ±1.633. 
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• The  starting  point  for  the  algorithm  is  set  in  a data  line  for  XGUESS  in  the 
main  program.  The  line  in  question  is 

DATA  XGUESS/0 . 5 , 0.5,  -0.5/ 

since  the  starting  point  used  in  the  example  was  x'  = (0.5, 0.5, —0.5).  The 
experimenter  should  use  a number  of  different  starting  points  from  different 
areas  of  the  experimental  region.  This  will  help  to  eliminate  the  possibility  that 
the  algorithm  settles  on  a local  maximum,  rather  than  the  global  maximum. 

• The  upper  triangular  portion  of  the  correlation  matrix  must  be  given  in  a data 
line  for  sig  in  the  mprob  subroutine.  Note  that  this  is  the  correlation  matrix, 
not  the  variance-covariance  matrix.  The  correlation  matrix  must  have  diagonal 
elements  equal  to  1.  The  lines  in  question  are 

data  sig/0. 01885893,  -0.02720418,  -0.22046146,  0.15876509, 

& 0.072886,  -0.05203539/ 

since  this  is  the  correlation  matrix  from  the  example.  The  correlation  matrix 
can  be  obtained  from  statistical  packages,  such  as  Minitab,  Splus,  or  SAS. 

• The  bounds  for  each  response  variable,  as  well  as  the  type  of  response  variable 
must  be  set  in  the  mprob  subroutine.  The  upper  and  lower  bounds  are  set  in 
a data  line  for  yu  and  yl,  respectively.  The  type  of  response  is  set  in  a data 
line  for  inf.  If  the  response  is  to  be  maximized,  inf  should  be  set  to  0.  If  the 
response  is  to  be  minimized,  inf  should  be  set  to  1.  If  the  response  has  a specific 
target,  inf  should  be  set  to  2.  The  lines  in  question  are 

data  inf /0, 0,2, 2/ 

data  yl  /120. 0,1000. 0,400. 0,60.0/,  yu  /500. 0,1500. 0,600. 0,75.0/ 
since  these  are  the  limits  of  the  four  responses  from  the  example  given.  In  the 
example,  the  first  two  responses  were  to  be  maximized,  thus  the  first  two  values 
of  inf  are  0,  and  the  last  two  responses  had  targets,  thus  the  next  two  values  of 


inf  are  2. 
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• The  standard  deviations  of  each  of  the  responses  must  be  entered  in  a data  line 
for  stdev  in  the  mprob  subroutine.  The  standard  deviation  is  typically  equal  to 
the  mean  squared  error  from  the  regression  analysis.  Also  note  that  if  S is  a 
diagonal  matrix  of  the  standard  deviations  of  the  responses,  then  the  variance- 
covariance  matrix  can  be  obtained  by  pre-  and  post-multiplying  the  correlation 
matrix  by  S.  The  line  in  question  is 

data  stdev  /5.6112,  328.69,  20.549,  1.2674/ 
since  these  were  the  standard  deviations  from  the  example  for  the  four  responses. 

• The  regression  equations  for  each  of  the  responses  y(i ) must  be  specified  in  the 
subroutine  evaluate.  The  first  equation  from  this  subroutine  begins 

y (1)  = 139. 1192387+16. 4936447*x(l)+17 . 8807651*x(2) + 
which  matches  the  coefficients  in  table  4.2. 

The  multiple  response  robust  parameter  design  situation,  where  there  is  more 
than  one  response  of  interest,  and  the  variance-covariance  matrix  depends  on  the 
levels  of  the  control  factors,  can  be  solved  using  a Fortran  program  similar  to  that 
in  appendix  B.  An  additional  change  to  those  listed  above  would  be: 

• The  standard  deviations  would  not  be  entered  into  a data  line  in  the  mprob 
subroutine,  rather  they  would  have  their  own  separate  regression  equations. 
These  equations  could  be  put  into  the  evaluate  subroutine  or  into  a separate 


subroutine. 
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